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COMPLEX ANALYSES OF VARIANCE: GENERAL PROBLEMS* 


Bert F. GREEN, JR. 
LINCOLN LABORATORY, MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
AND 
Joun W. TuKEY 
PRINCETON UNIVERSITY 


Problems in applying the analysis of variance are discussed. Emphasis 
is placed on using the technique to understand the data. The scale of the 
dependent variable is important for the analysis. Crossed and nested cate- 
gories must be recognized. The error terms in the analysis depend on whether 
the classes of each independent variable are (1) all out of a few or (2) a few 
out of many. To simplify the analysis, mean squares should be aggregated 
with their error term when they are less than twice its size. An illustrative 
example is discussed in detail. 


The framework of experiment and the additive decomposition of obser- 
vations that are associated with the analysis of variance, form what is perhaps 
our single most powerful methodological technique. Kogan [14] has recently 
reviewed the use of this technique in psychological research. His review 
shows that the analysis of variance has often been used with simple ex- 
perimental designs involving only two or three factors, but that instances 
of its use in complex experiments are rare. It is our purpose to discuss some 
of the problems that arise in complex analysis of variance, and to suggest 
methods for dealing with these problems. To be discussed in the present 
paper are (1) the choice of the dependent variable to be analyzed, (2) the 
shape of the analysis, (3) the choice of the proper error term, and (4) the 
aggregation and pooling of mean squares. 

Throughout the discussion we shall emphasize what may be considered 
the major purposes of the analysis of variance: to provide a simple summary 
of the variation in the experimental data, and to indicate the stability of 
means and other meaningful quantities extracted from the data (and thus 
to make more precise our understanding of how much has been learned from 
the experiment). Many investigators believe that the sole purpose of the 
analysis of variance is to provide statistical tests of significance, and some 
seem to equate these to tests of meaningfulness. We hope to counteract 
such views by showing how the analysis of variance can be used to summarize 
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the data effectively and to help in understanding what “goes on” in the 
experimental situation. While we shall rely on the conventional F-test to 
give some guidance, the primary function of the analysis of variance is to 
help the investigator understand his data. As such, it may need to be used 
more than once on the same data. As such, it deserves guidance from graphs 
and other devices for seeking understanding. It should not be an end in itself. 


The Illustrative Example 


The psychophysical experiment that serves as the illustration for the 
discussion is that used by Johnson and Tsao [12] as a methodological example. 
The detailed observations are in Johnson’s textbook [11]. In the experiment, 
difference limens for weights were measured by a method of continuous 
change. An aluminum pail was attached by a lever system to a ring on the 
subject’s finger. One of seven weights—100, 150, 200, 250, 300, 350, or 400 
grams—was placed in the pail. By controlling a flow of water into the pail, 
the experimenter increased the load in the pail at one of four constant rates— 
50, 100, 150, or 200 grams per 30 seconds—until the subject reported a 
change in pull. The difference limen, DL, in grams, was measured by the 
amount of water added. 

Four men and four women served as subjects. Two of each sex were 
congenitally blind, the others had normal vision. Age was partially con- 
trolled; all women were 33, while one blind and one normally sighted man 
were 21, and the two remaining men were 25. 

Two practice trials were given at the start of the first session. In the 
experiment the order of presentation of the weight-rate combinations was 
randomized. For each weight-rate combination, five determinations of the 
DL were made for each subject. The mean of these five measurements was 
used in the analysis. The entue cx;>2riment was carried out on each of two 
days, one week apart. 


Choice of the Dependent Variable 


In setting up an analysis of variance, we must first decide what scale 
to use for the dependent variable. We want to choose a scale that will yield 
the simplest relations with the independent variables. By simplest relations 
we mean, for example, fewer important interactions, and larger main effects 
relative to the error variance. A change of variable that nearly removes a 
particular main effect also usually leads to very revealing results. Secondarily, 
we would like the dependent variable to have approximately homogeneous 
variance within cells of the design. (Indeed, if nothing else could be altered, 
some attention might be given to making the within-cells distribution 
moderately normally distributed.) 

Past experience with similar data often suggests the appropriate scale 
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to use. In other cases the results of one analysis may suggest a reanalysis 
with a transformed dependent variable. In Johnson and Tsao’s experiment, 
the DL might be expressed as a difference in weight, a squared difference in 
weight, a ratio of weights, or a logarithm of a ratio of weights. It might also 
be expressed as a response time. In fact, analysis of the experiment will show 
that the response-time measure leads to a strikingly simple account of the 


data. 
The Shape of the Analysis 


The next step in setting up an analysis of variance is to specify for 
what main effects, interactions, etc., it would be appropriate to compute 
mean squares. In many texts the simplest experimental designs are described 
as “single classification,” ‘double classification,” “triple classification,”’ etc. 
In this terminology, the present experiment has six classifications: sex, 
sight, weight, rate, date, and person. The relations among these classifications 
determine what interactions are available. 

The most basic relation between two classifications is crossing. Two 
classifications are completely crossed if each value of one classification is 
paired with each value of the second classification. Thus rate and weight 
are completely crossed in the present experiment, since each selected weight 
appears with each selected rate. Sex and sight are similarly crossed. In this 
experiment, not only does each combination occur, but every combination 
occurs an equal number of times—the experiment is balanced. Completeness 
and balance of crossing greatly facilitate both analysis and interpretation of 
results. They are often built into planned experiments, although they occur 
much less frequently in unselected situations. Height and weight are incom- 
pletely crossed in human populations. Such combinations as height 3 feet, 
weight 300 pounds, or height 7 feet, weight 30 pounds are nonexistent, or 
at least very unlikely. When all variables are crossed in the experiment, all 
combinations of variables may be represented by interaction terms in the 
analysis. 

A second common relation between classifications is nesting. One classi- 
fication is nested in another classification, if the inner has meaning only 
within a single class of the outer. Thus person is nested in sex, since any 
particular person will be of a particular sex. When one classification is nested 
in a second, one cannot average over the outer classification without auto- 
matically averaging over the inner, or nested, classification. Nesting occurs, 
for example, in psychological experiments when subjects are divided into 
groups. The subjects are thus nested in the groups. The analysis includes 
a main effect and a set of interactions (again with other variables) for subject 
within group. In many replicated experiments, the replication is a special 
case of a nested classification. The replication is nested in the combination 
of all other variables in the experiment, since one must average over repli- 
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cations before averaging over any other classification. In other replicated 
experiments, replication is crossed with some or all of the other variables. 
In Johnson and Tsao’s experiment sex, sight, weight, rate, and date 
are all crossed, while person is nested in the combination of sex by sight. 
One cannot obtain such interactions as person X sex, person X sight X rate, 
etc., since one cannot average over sex or sight without first averaging over 
persons. There are such interactions as weight X person, but this is actually 
the weight < person (sex X sight) term, and is interpreted as the inter- 
action of weight with “persons within the sex-sight combinations.” Since 
the analysis of such situations is less familiar than for the completely crossed 
case, some sample computational formulas are given in the appendix. 
Many experiments exhibit only combinations of crossing and nesting. 
The shape of the analysis can always be determined by identifying those 
means that are meaningful. In other experiments, more complicated relation- 
ships may occur. In cross-nesting, a variable which could be crossed with 
another is forced to nest within it instead. In complete or partial confounding, 
two or more variable are “mixed-up” together to a greater or less extent. 


Variance Components and the Proper Error Term 


There has been some confusion about the appropriate error term against 
which the main effects and interactions should be evaluated. Should all 
mean squares, or lines, be tested against the residual term, or should some 
lines be tested against one or more of the interaction terms? The solution 
to this problem depends on a choice of a model for the experiment. The 
analysis of variance is based on a model in which the dependent variable 
consists of the sum of several contributions; these contributions depend 
upon the nature of the independent variable and the way in which the ex- 
periment was performed. There is no definite model which is appropriate for 
all instances, even for all instances of what may seem, at first glance, to be 
exactly similar experiments. However, many models can be regarded as 
special cases of a single, flexible, general model. 

In general, models differ in the assumptions that are made about inter- 
actions, and about the nature of the variables, i.e., whether the values at 
which each variable appears are fixed, random, or of some intermediate nature. 
It has been shown ([22]; [6]; [3], pp. 366 ff; [25]) that there is no need to make 
restrictive assumptions about interactions in order to obtain the most impor- 
tant results—those that determine the correct error term. The fixed and ran- 
dom cases can be interpreted as the extreme cases of sampling of actual values 
from a (usually finite) population of potential values (see same references). 
Thus models for the analysis of variance have been greatly expanded since 
the appearance in 1947 of the, now classical, paper by Eisenhart [8]. Dis- 
cussion of this development may be found in Mood [15], Kempthorne [13], 
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Anderson and Bancroft [1], Crump [7], Wilk and Kempthorne [26], Scheffé 
[19], and Bennett and Franklin [8]. 

To illustrate and explain the general model, an auxiliary example from 
the field of psychological testing has been chosen. A number of persons 
a= 1,2,--- , pare selected at random from a population of P persons, and 
a number of possibly equivalent tests 7 = 1, 2, --- , t are selected at random 
from a population of 7 tests. Each selected test is given to each selected 
person. Two scores are obtained for each test by a split-half method. 

A model will be presented for these data that includes populations of P 
persons, 7 tests and two halves. One can then treat the experimental persons, 
and tests, as samples from the populations of persons, and tests. When a 
sample exhausts the population, the corresponding variable is fixed; when 
the sample is a small (i.e., negligible) part of the population the correspond- 
ing variable is random. The model is 


Yirw = Mba, + By +i + eis y 
where 


= common or general contribution, 
a; = contribution specific to person 7, 
8; = contribution specific to test j, 
Yi; = contribution specific to the combination of person 7 on test J, 
€,;, = contribution specific to the combination of person 7, test j, 
and half k, 


and where z runs from 1 to P (over the whole population of persons), 7 runs 
from 1 to 7, and k runs from 1 to 2. 

It is assumed here that half is nested in the combination of test and 
person, and is essentially a replication within this combination. If each 
test has been split into two equivalent subtests, this is the reasonable assump- 
tion. If the halves are not really equivalent, so that, for example, a certain 
sort of individual might do systematically better on half 1 of test 7 than on 
half 2 of the same test, while a second sort would do the reverse, then this 
assumption would be unreasonable. In the latter case, one would have to 
add an interaction between persons and “halves within tests.” If the halves 
were the first and second halves of speeded tests, then one would also have 
to add a main effect of half, and to allow persons to interact directly with 
“half.” For the present example suppose that neither of these complications 
has arisen. 

It is reasonable to require that the average of y,; over all 7 in the popu- 
lation does not depend on j. This can be done since any apparent dependence 
on j is certainly specific for the test (in this population) and is represented 
by 8; . Notice that it would not make good sense to make a similar require- 
ment for the means of y;; over only the selected persons, since such a require- 











132 PSYCHOMETRIKA 





ment would force one to let the 6’s depend on the sample of persons selected. 
In order for the model to apply to the population, the 6’s cannot depend on 
any particular sample from the population. Similarly, one requires that the 
average of y;; over all 7’ tests, j = 1, 2, --- , 7, used and unused, should 
not depend on 7. 

Since the e’s represent the error variation in the model, hopefully and 
for simplicity, assume that they represent truly erratic error, and can them- 
selves be treated as a sample of 2 P7’ values from an infinite population. 
For the sake of the analysis, it is assumed that there is a very large—infinite— 
number of ways to split each test into equivalent halves. This assumption 
is closely related to the assumption that the halves are really equivalent. 

Now define the variance components for persons, tests, interactions, 
and between-halves as follows: 


oe = p54 x (a; — a)’, 


2 1 2 
a= 7] UG - 6, 





Po 1 ae 2 
1, = (P—-)T— 1) de a. Ya 


1 - 
a - PT —1 > (€sjx = @)’, 


where the summations are over whole populations, and not merely over the 
indices appearing in the sample, and where “‘—” subscripts refer to averages 
over the corresponding populations, that is, 


1 1 i 
a =p lia, b= LB, Y-- = pp Lets» 


€ = average of ¢ in an infinite population. 

With these definitions, Tukey [22], Cornfield and Tukey [6], Bennett and 
Franklin [3], and Wilk [25] have shown that in repetitions of the whole 
procedure, each time independently selecting new samples of persons, and 
of tests, the average values of the mean squares found by analysis of variance 
are 


for persons: o + o(1 - ‘a + 2te2; 
for tests: o + (1 _ 2), + pos; 
for interaction: 0° + 207; 


between halves: o’. 
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Here, as in many simple situations, the variance components o” , 02 , 03, 77 
are defined as the mean squares that would correspond to the analysis of 
variance of all the possible values in the model. In this example, an infinite 
number of halves of each of the 7 tests applied to each of the P persons 
(without practice effect) define this analysis. 

Clearly, the average mean squares depend on the relation of ¢ to 7, and 
of p to P. If t = T, and p = P, one has a completely fixed model, since the 
sample has exhausted the population. Repetitions would lead to exactly 
the same set of values for the a’s and _@’s. In this case, the proper error term 
for the main effects is the between-halves mean square. If ¢ and p are, re- 
spectively, very much smaller than 7 and P, so that t/T and p/P are essen- 
tially zero, one has a random model so far as persons and tests are concerned. 
Here repeated sampling would lead to different values of the a’s and @’s. 
The proper error term for the main effects is now the interaction mean square. 

One might, alternatively, have a mixed model, with p/P = 0, andt = T. 
The average mean squares would now be 


for persons: o + 2te; 
for tests: a + ok + Qos; 
for interaction: o” + 20); 


between halves: a’. 


To inquire whether there are differences among the persons is equivalent to 
inquiring whether o7 = 0. If it vanishes, then the average mean square 
for persons equals the average mean square between halves. The correct 
F-test is thus the ratio of the two corresponding mean squares. 

To inquire whether there are differences among the tests is inquiring 
whether of; = 0. If it vanishes, the average mean square for tests equals the 
average mean square for interaction. Thus the correct F-test has an inter- 
action mean square, and not a between-halves mean square in the denominator. 

The difference in the denominators seems strange at first, but on further 
consideration it becomes very meaningful. When one speculates about the 
equivalence of tests for the whole population of persons, one finds that all 
the tests have been applied to only a few out of many persons. Thus if persons 
and tests interact, another sample of persons would give systematically 
different comparisons of tests. The interaction mean square must be used. 
Things are quite different indeed when comparing persons. Every test has 
been applied to each selected person; redrawing the sample of tests would 
make no difference since one still uses the same tests. Here one should use 
the between-halves mean square. 

In rare cases t/7' and p/P might be finite fractions, say 0.8 and 0.1. 
Then the appropriate error term for the main effects would be a corresponding 
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weighted average of the interaction and between-halves mean squares. 
(Bennett and Franklin [3] and Cornfield and Tukey [6] give the explicit 
formulas.) Similar situations occur in examining the mean square for one 
variable when the interactions of this variable with at least two other variables 
are significant. In these cases, the F-distribution is surely an approximation, 
and a modified technique is needed to find an appropriate denominator. The 
procedure used in the analysis presented below is described in the Appendix. 

Except for scales, i.e., quantitative variables (whose treatment must be 
postponed) and certain rather rare cases, the classification in an experiment 
will usually be of two types. 


(i) A finite number of classes, each of which is represented in 
the experiment. 

(ii) A large finite, or infinite, number of classes, only a few of 
which are represented in the experiment. 


When we have decided the type of each of the classifications in the experi- 
ment—fixed (i) or random (ii)—then we can determine the average mean 
square for each line in the analysis. The rule is that the average mean square 
for a line includes the variance components for the interaction of that line 
with random classifications or combinations of random classifications. For 
example, if the classifications are labeled A, B, C, D, E, etc., and if C is fixed 
and D is random, then the AvMS for AB would include variance components 
for ABD but not for ABC or ABCD. If both C and D are random, it would 


include ABC, ABD and ABCD. The same rule applies for nested variables: 
if C is nested in B then the AvMS for AB would include the variance com- 
ponent ABC if C is random, but not if C is fixed. Note carefully that the types 
of A and B are irrelevant when choosing the error term for AB. 


Variance Components in the Illustrative Example 


In the difference-limen experiment, person is clearly random, and sex 
is clearly fixed. Although we could conceive of a continuum on which ‘“‘com- 
plete sight’’ and “total blindness” are extreme, we naturally assume that in 
this experiment attention is directed at these extremes as special cases, so 
it is best to consider sight as fixed. 

In the case of date we must beware of a practice effect. Insofar as the 
two dates bring in weather conditions, nearness to a week-end, etc., they 
may be reasonably well represented as a sample of two from a large number, 
but insofar as practice or learning is concerned they are two very particular 
stages. Since only two practice trials were given before the experiment was 
begun, there is no reason to suppose that the practice effect is so small as to 
be unmeasurable. The importance of this effect depends on the extent to 
which the habits, skills, and attitudes developed during the first session 
were retained over the intervening week between the two sessions. In any 
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event, all our data refers to subjects in this general state of moderate practice, 
and the main effects of all variables except date must, therefore, refer to 
moderate practice. Since it seems likely that the contribution of the date- 
to-date difference to the effects and interactions of other variables is more 
likely to be irregular than systematic, we decide to treat these two dates as 
a small sample from many possible dates. 

Rate and weight present problems. We apparently need a third type of 
variable, a scale, a few categories of which are selected for the experiment. 
These values have very probably been spaced at regular intervals along the 
scale, so we cannot interpret them strictly as a small random sample from 
many possible classes. On the other hand, we are not interested only in the 
particular selected classes. Instead, these classes are used to represent the 
continuous variable in the analysis. Two different procedures can be adopted 
in analyzing classifications that represent scales. One is simple and familiar, 
but requires a somewhat cavalier interpretation of the variance components; 
the other is unfamiliar, but more rigorous and effective. 

The simple technique is to treat scales as categories, using the “few out 
of many” model. This technique will show whether the scale has any sig- 
nificant main effects or interactions, since the null hypothesis is not violated 
by the technique. However, the size of the variance components will depend 
on the range of selected values from the scale. For example, if there is a 
monotonic relation between the scale and the variable being analyzed, the 
variance components will usually increase as the range studied increases. 
The variance components for scales, then, indicate the relative importance 
of the particular range of the scale included in the analysis. In order to 
discover “what is going on,” the investigator must return to the data, per- 
haps graphing the observed relation of the response selected for analysis 
to the scale. The analysis of variance essentially tells him which graphs are 
important. 

The rationale for treating scaled variables as random rather than fixed 
is far from well established. Bennett and Franklin [3] felt that they had to 
adopt one policy or the other, and elected to treat scaled variables as fixed. 
Wilk and Kempthorne [26] likewise treat scales as fixed. The present authors 
chose to treat them here as random. Surely the last word has not yet been 
spoken on this issue. 

While psychologists have usually treated scales as categories in analyzing 
complex experiments, there is a more rigorous procedure in which the nature 
of the relationships between scales and the analyzed variable are an integral 
part of the analysis. Orthogonal polynomials may, for example, be used to 
fit curves describing the relationships. The analysis separates the smooth 
effects, represented by the fitted curve (and perhaps summarized by the 
coefficients of the polynomials) from the erratic effects, represented by the 
variation around the fitted curve (see Grant [9]). A detailed exposition of 
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curve-fitting techniques will be postponed, since such a discussion in the 
present context would introduce complexities that might obscure the main 
points of the argument. Consequently, at present rate and weight are treated 
as random classifications, although we recommend against this procedure 
in general. 

The model for the difference limen experiment can now be represented 
by Table 1, which shows, schematically, the average mean square for each 
line in the analysis. The independent variables are denoted S = sex, J = sight, 
P = person, R = rate, W = weight, and D = date. Whenever P is written, 
P(SJ) is understood, since persons are nested in the SJ cells, so that the 
presence of a P implies the presence of SJ. In our schematic presentation we 
have omitted the o’ factors, and also the numerical coefficients. In each case 
the coefficient is the number of observations for any set of fixed values of all 
the variables represented by the component. For example, the coefficient 
of op is the number of observations for a fixed person at a fixed rate, which 
is 7 weights < 2 dates, or 14. Thus, the notation RP in the AvMS column 
indicates 14 7,» . The error term for each line is immediately apparent from 
the table of AvMS’s. 


Aggregation and Pooling 


In complex experiments there are many mean squares. If it is possible 
to obtain a simple description of the experimental results, many of these 
mean squares will not be significantly different from their error terms. Some 
investigators prefer to pool the nonsignificant mean squares in order to get 
a more stable error term for further significance tests. In our example of the 
split-half tests, the pooling problem comes up when the interaction mean 
square and the between-halves mean square are comparable. In any event, 
a more stable error term would be obtained by pooling the sums of squares 
for interactions and for between-halves and dividing by the pooled number 
of degrees of freedom. If the interactions variance component were zero, this 
pooled error would be appropriate. If the interactions variance component 
were large, then pooling would yield too large an error term for testing lines 
not “above” the interaction term, and too small an error term for testing 
lines “above” the interaction term. When the interaction mean square is 
near the between-halves mean square, the proper balance of these alterna- 
tives—the choice to pool or not to pool—is difficult. 

Two extreme points of view are prevalent, one optimistic and the other 
pessimistic. The optimistic extremists refuse to admit the existence of the 
interactions variance component unless an F-test applied to interactions 
and between-halves mean squares surpasses a chosen level of significance. 
In the absence of significance they pool. This policy is extremely hard to 
justify. The pessimistic extremists refuse to admit that the interactions mean 
square is ever zero, so they never pool. 
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Average Mean Squares (AvMS) and 


Coefficients (DIV) for DL Experiment 











DIV AvMS (Schematic )* 





Lines Not Involving P, S, or I 


RWDP+RWP+RDP+RP+RWD+RW+RD+R 
RWDP+RWP+WDP+WP+RWD+RW+WD+W 
RWDP+RWP+WDP+DP+RWD+RD+WD+D 
RWDP+RWP+RWD+RW 
RWDP+RDP+RWD+RD 
RWDP+WDP+RWD+WD 

RWDP+RWD 


CLE has 





Lines Involving S But Not I 


ook RWDP+RWP+RDP+WDP+RP+WP+DP+P+RWDS 
+RWS+RDS+WDS+RS+WS+DS+S 
56 RWDP+RWP+RDP+RP+RWDS+RWS+RDS+RS 
32 RWDP+RWP+WDP+WP+RWDS+RWS+WDS+WS 
112 RWDP+RDP+WDP+DP+RWDS+RDS+WDS+DS 
8 RWDP+RWP+RWDS+RWS 
28 RWDP+RDP+RWDS+RDS 
16 RWDP+WDP+RWDS+WDS 
4 RWDP+RWDS 


eguEuae ° 





Lines involving I but not S are omitted: I, RI, WI, DI, RWI, RDI, 
WDI, RWDI; they may be obtained from the corresponding line above 
involving S but not I by replacing S by I. E.g., 


RWI 8 RWDP+RWP+RWDI+RWI 





Lines involving SI in combination are omitted: SI, RSI, WSI, DSI, 
RWSI, RDSI, WDSI, RWDSI; they may be obtained from the corresponding 
line above involving S but not I by replacing S by SI and halving 
the value of DIV. E.g., 


RWSI 4 RWDP+RWP+RWDSI+RWSI 





Lines Involving P 


P 56 RWDP+RWP+RDP+WDP+RP+WP+DP+P 
RP 14 RWDP+RWP+RDP+RP 

WP RWDP+RWP+WDP+WP 

DP RWDP+RDP+WDP+DP 

RWP RWDP+RWP 

RDP RWDP+RDP 

WDP RWDP+WDP 

RWDP RWDP 





” Each set of letters represents a term consisting of a of” with that 
set as subscript, appearing with a coefficient equal to the DIV for the 
line of that set. E.g., RWP signifies 2 RWP wherever it appears. Thus, 
in detail, the AvMS for RW is 


ORWDP + 20 yp + Boxyp + 160m, 
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The subject has been reexamined by Paull [17] who investigated the 
power of alternative tests both analytically and empirically. He concluded 
that it is desirable to combine sums of squares and degrees of freedom when 
the F-ratio is less than twice the 50 percent point of the F distribution with 
the corresponding degrees of freedom. As a simpler rule of thumb that gives 
about the same results when each mean square has more than six degrees of 
freedom, he suggested pooling when the F-ratio is less than two. (Notice 
that an empirical rule is used, not a significance test.) Bozivich, Bancroft, 
and Hartley [2] have also studied the pooling problem in detail. Their 
recommendations are more precise and more complicated than Paull’s. The 
differences appear to be critical mainly when there are very few degrees of 
freedom and when the exact level of significance is at issue. Since we are 
mainly concerned with experiments with several degrees of freedom in the 
error terms, and since we attach minor importance to exact significance 
levels, we favor using Paull’s rule of thumb. When mean squares are com- 
bined according to some form of this rule, we shall speak of aggregation 
rather than pooling. 

To describe the aggregation procedure, some definitions are needed. 
If the average mean square (AvMS) for one mean square, or line, contains 
all the terms in the AVMS for another line, the first line is above the second 
line, and the variance components that appear in the first AVMS and not in 
the second are in the difference of the two lines. If the two lines are aggre- 
gated, the variance components in their difference are neglected. 

The rules for aggregation are: a line should be aggregated with a basic 
line only when (a) the first line is above the basic line, (b) the first mean 
square is less than twice the basic mean square, (c) no variance component 
in the difference of the two lines has been, or can be, shown by another 
application of (6) to be a variance component that should not be neglected. 

The aggregation procedure starts with the highest-order interaction 
or the replication error term; the lines that can be aggregated with it accord- 
ing to these rules are found and aggregated with the initial (basic) line by 
pooling sums of squares and degrees of freedom. Next, we determine which 
lines can be aggregated with the highest-order interaction among those 
lines not already aggregated in the first step. We proceed in this manner 
throughout the table of mean squares, being particularly careful about 
rule (c). A detailed example of the aggregation process is given in the illus- 
trative analysis below. 

Once mean squares have been aggregated according to this generali- 
zation of Paull’s rule of thumb, we are in a position to consider whether 
further pooling is in order. It is conceivable either to keep the once-aggre- 
gated lines carefully separate, or to test the significance of aggregated lines, 
and pool nonsignificant lines with their error terms. The choice between 
these actions must depend on our purpose in making the analysis. 
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If our purpose is to identify those lines whose contributions demon- 
strably cannot be constant, then it does not matter. In any event the aggre- 
gated values, not the aggregated and pooled values, should be used as error 
terms. (If our purpose is to make confidence statements about these con- 
tributions, the situation is the same.) 

If our purpose is to summarize the apparent distribution of variability 
among the several contributions, assigning variability to only those contri- 
butions which are unequivocally (at the chosen significant level) not constant, 
then somewhat of an argument can be made for pooling after aggregation. 
For by doing this we force all the variability into “‘significant’’ contributions, 
which may sometimes be desirable. (The counterargument is that we know, 
in reality, each line corresponds to the contribution of some variability.) 

If our purpose is to make a generally useful summary of the distribution 
of variability, then the aggregated values should serve our needs. There has 
been some compression in the interest of apparent simplicity, but not too 
much. If our purpose is to make a point estimate of an individual variance 
component, then we should not even aggregate. Even a failure to prove a 
variance component nonzero does not justify an estimate of zero. (We may 
well wish to supplement our point estimate with an interval estimate in any 
case. Currently available methods have been discussed by Bross [4], whose 
method has been slightly modified by Tukey [21], pp. 66-68.) 

In stating estimated variance components, we are often trying to arrive 
at a simple (and at times rough) description of the data. From our point of 
view, all variance components are almost sure to be nonzero. However, many 
may be so small that a general overview of the data is more effective when 
they are omitted. In these situations estimates of variance components should 
show the relative importance of the variables. Statistical significance is 
secondary. 

Kogan [14] notes that some investigators do not calculate high-order 
interactions, but lump them with the residual term for a composite error 
term. This saves some work, since the high-order interactions require lengthy 
calculations, but, as Kogan has pointed out, involves dangers. If some of 
the lumped interactions are large, the composite error term will be too large, 
so that precision may be sacrificed. Moreover, information about these large 
interactions will be lost, and we are less likely to inquire about their causes. 
A nice balance between extra work in analysis and possible loss of valuable 
information is required. It usually seems to pay to go at least somewhat 
further than has been the typical practice in the past. Going all thé way is 
likely to be wiser than not going far enough. 

If a complex analysis, which has not been taken all the way, has a number 
of mean squares notably smaller than the mean square for pooled error, the 
analysis should be pushed further. The original Johnson and Tsao analysis 
provides a clear example where this rule should be applied. Their error 
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mean square is 154 and, above this error term, there are 12 mean squares 
less than 30. 


The Illustrative Example Again 


An analysis of the difference-limen experiment will provide a detailed 
example of the aggregation and estimation procedures that we suggest for 
complex experiments. Following Johnson and Tsao, we shall first use the 
DL’s, in grams, as the dependent variable. Table 2 shows the sums of squares, 
degrees of freedom, original mean squares, and aggregated mean squares for 
the experiment. Each ‘“‘line” is represented by the entries in corresponding 
positions of the various two-way tables—its designation being found by 
combining row and column labels. ’ 

In the aggregation process, begin with RW DP, the lowest line in the table, 
whose observed mean square is 26. Every combination of RWD, RD, or RW 
with any of P, SJ, S, J, and “—’’ is less than 52. Thus aggregate all of these. 
WP (observed MS = 39), DS (observed MS = 9) and DI (observed MS = 4) 
are also less than 52, but cannot be aggregated unless one also aggregates 
WDP (observed MS = 74) and WDS (observed MS = 106) which lie 
between—and these are too large for the rule of thumb. It may seem too 
bad not to aggregate a 9 and 4, but we must be consistent. 

Combining the 15 sums of squares and 15 sums of degrees of freedom, 


1043 + 320+ --- +61 +--+ + 205 + 527 = 6381, 
72 +18 + +++ +34 -++ +18 +418 = 312. 


The aggregated mean square is 6381/312 = 20, which is labeled by RWDP, 
the label of the lowest line involved. 

The lowest line now remaining is WDP with an observed mean square 
of 74. One can aggregate WDSI, WDS, WDI, WD, WP, and WS with this. 
(While DS, DI, and D are small enough, DP lies between with an observed 
mean square of 178.) The aggregated sum of squares and degrees of freedom 
are 

943 + 405 + 1779 + 379 + 637 + 340 + 488 = 4971, 


a+ OF S++ C+ S++ 6+ 8= BB. 


The aggregated mean square is 64 = 4971/78. 
The lowest lines now left are DP, WSI, WI, W, RP, no one of which is 


part of another. 
DP can be aggregated with DSI, DS, DI, and D to yield 


712 + 2700+ 9+4+ 116 = 1111, 
4+ 1+1+1+ 12: 8, 
and an aggregated mean square of 139. 
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No other aggregation is possible at this step. We have reduced 39 lines 
to 15 and can reasonably proceed to the remainder of the analysis. 
All aggregated analysis of variance tables will have the following columns. 


Line = label of interaction or main effect. 
DF = degrees of freedom. 
MS = mean square. 
AvMS = average mean square. 
DEN = value of denominator, or error term for this mean square. 
DDF = denominator degrees of freedom. 
SIG = level of significance of F-ratio, F = MS/DEN; * = 5%, ** = 1%, 
*e* = 0.1%. 
DIV = coefficient of the variance component of this line in AvMS. 
s’ = estimated variance component = (MS—DEN)/DIV. 
pooled s’ = estimated variance component computed from pooled mean 
squares (insignificant MS’s being pooled with their DEN’s). 


The final analysis is shown in Table 3. In the AvMS column appear 
only those variance components for which lines are present in the (aggregated) 
table. The main effects of Person and Rate predominate. The variance com- 
ponent for S is estimated to be large, but the mean square is not significantly 
larger than its error term. In addition to the main effects of P and R, there 
are some small, but significant interactions: RS, RP, W, WI, WSI, DP, 
and WDP. 

At this point, it is interesting to compare our results with those of 
Johnson and Tsao. In the analysis of the difference limen experiment, Johnson 
and Tsao neglected the Persons variable. All Person interactions, and the 
main effect of Persons, were lumped in the error term. Thus the error term 
was made too large. On the other hand, all interactions were tested against 
this error term. Insignificant interactions were pooled with the error term 
to test the main effects. The significant lines according to their analysis 
were SJR, SI, Sk, IW, IR, S, I, W, and R. These overlap our significant 
lines in only three cases (R, W, and W/). 

In summarizing the results of the experiment, we must have a graph 
showing the mean limen for each person at each rate, since R, P, and RP 
are all significant. Other graphs are needed, but the implications of this 
graph, shown in Figure 1, force us to reconsider our analysis. For each person, 
the line connecting the four rate points is nearly linear, and, if extended, 
would nearly meet the origin. (We might well have noted this phenomenon 
before the analysis, and saved ourselves much labor! The implication is that 
the effect of Rate can be explained by assuming that each person in the 
experiment responds after a constant Time, regardless of the Rate.) If this 
is true, we would obtain a more clear-cut analysis by using response times, 
rather than different-limen in grams, as the dependent variable. It will be 
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Ficure 1 
Average DL for Each Person-Rate Combination 


instructive to compare an analysis of response times with the analysis of 
DL’s. 


Reanalysis in Terms of Response Times 


When the data were transformed to response times, it was found that 
the standard deviations were approximately linearly related to the means. 
This circumstance, which is usually found in latency data, suggests using 
the logarithm of response time as the dependent variable for the next analysis, 
so that the standard deviations would be more homogeneous. Table 4 shows 
the complete set of mean squares for the analysis of log times. Table 5 shows 
the final analysis-of-variance table. 

The main effect of Person clearly predominates all other effects in this 
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TABLE 4 
RO ee ees ee ere 

- 27, 89% 34,021 76,860 37,852 - 

R €2 87 4o 130 36 
Ww 48 147 19 539 551 
D 731 2091 489 36 920 
RW 22 25 21 21 45 
RD 30 72 32 31 19 
wD 63 60 63 ko 39 
RWD 28 10 25 13 16 





analysis. Figure 2 shows the mean response times for each individual on 
each day. (Psychophysical experiments often show individual differences 
in sensitivity, which may be especially marked in unpracticed observers 
and can usually be reduced substantially by thorough training. The choice 
of naive or trained observers depends on the purpose of the experiment.) _ - 

The rate components were almost entirely eliminated{ by the trans- 
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Figure 2 
Response Time for Each Person on Each Date 
(Arithmetic means of logarithms are equivalent to geometric means of times.) 
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Average Response Time for Each Weight-Sex-Sight Combination 


formation to response times. The conclusion that response time was virtually 
constant for the four rates of increase (for fixed person, weight, and date) 
has important implications for the method of continuous change. It suggests 
that one can get whatever difference limen one wants by choosing the pre- 
sentation rate appropriately. Both Pal [16] and Grindley [10] found that 
DL’s were functions of presentation rate, though neither found the pro- 
portionality noted in this study. We also-note that the effect of weight in 
the present study was not at all in accord with the Weber function. The 
peculiar effects of weight and rate indicate that the method of continuous 
change may be very different from the usual psychophysical methods. 

Among the secondary effects, the person-date interaction is large (see 
Fig. 2). The weight, weight-sight, and weight-sex-sight lines are small but 
significant. Figure 3 shows the mean response time for weight-sex-sight 
combinations. The weight-date-person interaction is caused almost entirely 
by the atypical behavior of one individual, whose data are shown on Fig. 3. 
All other persons showed much less variation on the weight variable. Since 
the form of the atypical curve is similar to the form of many DL curves near 
the absolute threshold, some sort of pre-experimental adaptation effect 
might explain the anomaly. 
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Portrayal of Variabilaty 


We have emphasized that the true purpose of analysis of variance is 
increased understanding of the results of the experiment, and have made 
use of graphs of mean values to this end. Estimates of variability, too, are 
likely to require graphical presentation. The columns of estimated variance 
components in Tables 3 and 5, for example, can be pored over and gradually 
understood to a degree. But, if we include the unaggregated variance com- 
ponents, there are six sets, three from each analysis. We may wish to compare 
them, and will often find cumulative graphs, like Fig. 4, helpful in doing this. 

No one line in Fig. 4 tells the whole story. The first three are based on 
DL’s in grams, and we have learned that this is unwise. The other three are 
based on logarithmic response times, and show no trace of the original apparent 
rate dependence. We can present the whole picture more completely, as in 
Fig. 5, if we take rate as estimated, from the pooled components in Table 
3, to provide 32 percent of the variability, and divide up the remaining 68 
percent among the pooled components in Table 5. We can gain effectiveness 
by using the two dimensions of the picture. (This also makes visibility 
directly related to standard deviation.) 
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Cumulative Graphs of Variance Components 
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Ficure 5 
Relative Variability in DL Experiment 


Summary 


Application of the analysis of variance to complex experiments is dis- 
cussed, with emphasis on the use of the analysis to understand the data. 
It is noted that the choice of the scale for the dependent variable is important. 
It is also important to recognize crossed and nested categories. The variance 
components model is used to specify the structure of the experiment and to 
determine the appropriate error term for each mean square. The error terms 
depend on whether the classes of each independent variable are (1) all out 
of a few or (2) a few out of many. A process of aggregation is described and 
illustrated to reduce the number of mean squares that need further study. 
The question of further pooling is discussed, and a graphical technique for 
portraying variance components is presented. 

An illustrative example shows the utility of transforming the dependent 
variable. It illustrates nested classification and provides an example of the 
aggregation technique. The results of the experiment can be summarized 
very concisely: differences. between individuals predominated; day-to-day 
variation and weight interactions were of secondary importance; for fixed 
weight, date, and person, response times were virtually constant, regardless 
of rate; one individual on one day behaved atypically. 
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Appendix on Computation Formulas 


1. Sums of squares 
Computational formulas for all sums of squares can be written as shown 
below. Define: 
Ne = 4, ny = ie Np = 2, Np = 2, ns = 2, nr = 2; 
N = NpNywNpNpnrsn; = 448. 
(Here P represents persons within each SJ combination. It does not represent 
PSI.) Let x represent the N individual values of the dependent variable, 


1.e., & = Lrwopst- 
Define bracketed terms as in the following examples: 


[ST] ne NsNz poe = 2)?, 


¢ S,I R,W,D,P 


(RWD) =? (2, 


WD P.&, 


[DPSI] = "ets" SJ (dy), 


N D,P,S,I RW 
.D 


2 
wv. 


P,S8,F 


[RW DPSI] = 


RW 
There is one bracketed term corresponding to each line in the analysis. 


There is also a bracketed term for the grand mean: 


bl=5¢ Da. 


R,W,D,P,8,1I 


With these definitions, the computational formulas for sums of squares 
(SS) for crossed classifications may be written as in the following examples. 


SSewn = [RWD] — [RW] — [RD] — [WD] + [R] + [W] + [D] — [-], 
SSs; = [SZ] — [S] — [7] + [°]. 


For the lines including persons (nested in SJ), the sums of squares may be 
written as in the following examples. 


SSp.srn = [PSI] — [SI], 
SSyecsn = [DPSI] — [DSI] — [PSI] + [SI], 
[RWPSI] — [RWSI] — [RPSI] + [RSI] — (WPSI] 

+ [WSI] + [PSI] — [SI], 


SSewrisn = 
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SSworcsrn = [RWDPSI] — [RWDSI] — [RWPSI] + [RWSI] 
— [WDPSI] + [WDSI] — [RDPSI] + [RDSJ] 
+ [RPSI] — [RSI] + [WPSI] — [WS] + [DPST] 
— [DSI] — [PSI] + [SI]. 








2. Degrees of freedom for linear combinations of mean squares 


The method used is adapted from Fairfield Smith [20], Welch [24], 
Satterthwaite [18], and Cochran [5]. If the error term for a particular mean 
square, DEN, , is a sum (or difference) of other mean squares, e.g., 

DEN, = MS; + MS- = MS, + be: 


then the degrees of freedom for this error term, DDF,, is obtained from the 
formula 










(MS,)’ 
DF, 


(DEN,)* _ (MS,)° 
DDF, DF; 


Example: Consider line P in Table 3. 
DEN p MSzp + MSpp — MSewor - 
450 = 331 + 139 — 20. 


(MS-)" 
DF> 








+ - ae =. 





I 















(450)* _ (331) . (139)* 4 (20)*_ | 
DDFp 12 8 312 5 
DDF> = 17.5. 






An alternative method, which has been proposed by Cochran [5] and which 
he believes to be safer, lacks flexibility, not being as appropriate for making 
interval estimates, or for comparing individual means. 









3. Estimating variance components after pooling 
Rather than recompute the entire table to find new pooled s”’s, a formula 
that approximates the results of a reanalysis after pooling is 


By, DR... 


- Seem ee 
pooled s; = s, + » DIV, Pooled DF ” 












1 
Soo. ie J 
= & + Dive qooled DF) 2 PFs (MS — DEN), 


where the sum is over all 8 to be pooled with line A. 
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VARIMAX SOLUTION FOR PRIMARY MENTAL ABILITIES* 


Henry F. Kaiser 
UNIVERSITY OF ILLINOIS 


The varimax solution for Thurstone’s classic Primary Mental Abilities 
study is presented. Comparisons between the factors of 7 Heenrwcensc original 
subjectively rotated factor pattern, Zimmerman’s subjectively revised solu- 
tion, Wrigley, Saunders, and Neuhaus’ quartimax results, and the present 
varimax factor matrix are made by finding correlations between factors 
defined by these four solutions. It is = out that any possible ultimate 
merit of the varimax solution should be based on its psychological meaning- 
fulness and on the rationale of the varimax criterion—not on its relationship 
to the other studies. 


In addition to its distinguished substantive status, Thurstone’s classic 
Primary Mental Abilities study [8] has been used repeatedly as a sounding 
board on which to try new methodological techniques. In particular, four 
revised factor-analytic bases for a common-factor space derived from 
Thurstone’s original correlation matrix have been published [2, 4, 9, 10]. 
In this paper a solution determined by the varimax criterion [5] for analytic 
rotation is presented. 

The results of this varimax orthogonal rotation of Thurstone’s original 
13 centroid factors in the common-factor space thus defined are given in 
Table 1. A psychological interpretation of these varimax factors will not be 
attempted here; it would seem that these data are so well known, and the 
interpretation of the major varimax factors so readily apparent, that it would 
be more appropriate to allow the reader independently to assess the varimax 
solution. 

However, quantitative comparisons of the varimax solution will be made 
with three other orthogonal bases for this same common-factor space: as 
determined subjectively by Thurstone [8] in his original monograph, as 
determined by Zimmerman [10] in a subjective revision of Thurstone’s 
solution, and as determined by Wrigley, Saunders, and Neuhaus [9] accord- 
ing to their quartimax criterion for analytic rotation [6, 7]. In Table 2 are 
presented correlation matrices between the 13 factors for each of the six 
possible pairs of solutions. These correlation matrices are the transformation 
matrices (or their transposes) for rotating any one solution into any other. 
{If A, and A, are two orthogonal solutions to be compared, and 7’,, is the 
orthogonal transformation matrix for rotating A, into A, : A,7',. = Az, then 

*The computations for this paper were done on Iliac, an electronic computer of the 


Digital Computer Laboratory of the University of Illinois. Mr. S. M. Hunka assisted in 
these computations. 
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TABLE 1 


Varimax Solution for Primary Mental Abilities 














4. Reading I O07 73 ae. OS 2). O03 BY: S32 a 11 14 05 
5. Reading II ih 8 13 05 15 12 19 2 o7 2 09 12 06 
6. Verbal Classification ML 6-43 +35 563 S>> $2 a5. - 2h 13 03 O02 2 -16 
7- Word Grouping a9 ° SB 8s re ae ep ap Oy 23° 2 ase 
6. Figure Classification LQ -Ok 16 «Ob 53 08 19 03 -~-06 03 -10 20 is] 
9. Controlled Association OT 8S? 35 Ob 9a ah aD 55 a as 68 SINS 23 
10. Inventive Opposites 038 55 48 13 Ob 19 05 13 O7 06 2 26 17 
11. Completion 25 59 50 -08 05 31 O06 12 2 10 #19 -13 -Oh 
12. Disarranged Words 22 20 = ane age « MS Saige Rs Re So ae es 
13. First and Last Letter ok 22 67 ey] 1l 05 a2 @2 «lf G1 /O1 2 03 
14. Disarranged Sentences 22° 85° 25) 4B! 09 9 57. 09). 23> 505-290 33 30 «O11 
15. Anagrams OF 05:66 aT 98100) 90 ar? 03° 20 11-36 06 
16. Inventive Synonyms Ol 45 28 sal ia Me. 23 ae eS. a 8 0 88 
17. Block Counting 64 O07 ou. RR g 17 -15 -O4 l2 “Ol 31 -21 1b 
18. Cubes 15 =O 22" 68" e565" eB a OS OG Sas 
_ —— A o7 29 4 13. 06 = 23 = 03 30 10 OF -02 
- Flags 83 02 0 20 -07 0 01 a5 605° 35 2 «23 
21. Form Board TO. -2O-: 25.33 23 a, 8: 02. 6 Sf 05 6BO ‘22 
22. Lozenges B fe] 00 03-05 03 30. 22-23. 50) 33. 49 19 -07 
23. Surface Development 71 16 10 O7 O2 O7 O7 27 Me 5 05 12 
24. Punched Holes 67 26 00 09 27 O07 25 03 06 31 O7 -OT 11 
25. Mechanical Movements ae EE | ee TOS) 8 OT) Sah OO) 68h O5 lear ay 
26. Identical Forms 3]. 29 OB 10 OT 57 -O1 13 06 6 ~20 ~-12 ~20 
27. Pursuit ©3 00 10 2 19 06 7 Oh 10 -21 O1 21 13 
26. Copying a9 "99 22° (20 oD) Bo 09 9G 9 ES: OF OE: 8 
29. Areas be Be. (03 NAD) SP) 0208 Ay Bs, 85. P2003) 9.) OF 
30. Number Code 45 09 1 65 s wa 19 =-05 10 30-17 ~06 
31. Addition 14 ~O4 03 78 10 10 05 ill “~7 “7 Ol 04 00 
32. Subtraction 07 14 18 7h 05 7 13 09 Oo2 08 se Ge x 12 
33. Multiplication 1h -0l 15 8 02 12 -15 08 27 Ob. Ob 05 00 
34. Division 21 11 12 6 09 OF 13 O09 43 -16 03 08 -17 
35. Tabular Completion ae 28 as ae GS? ee ea 8 07 Ob 08 
36. Estimating 23 13 03 Oh 03 OF 15 02 08 Ob 60 -0h -03 
37. Number Series 21 #i2 06 Nes 53 “4 = 25 22 = oo 4 4 
38. Numerical Judgment 8 21 03 0: |). 1 1S 2 £3 
39. Arithmetical Reasoning 35 38 30 "SOG 20 aD 02 50. 09 0 
40. Reasoning I, ee eee ey ee en <a © Mi ie i700 pt 
41. Verbal Analogies = ro = d - 4 74 : : = > 3 4 = 
. 8 - 
43. atone i8 i o7 26 3% 1B 1 ei 00 2 7 9 26 
44, Pattern Analogies 36 29 «29 «02 23 os 4 Be 4 3 1 19 09 
45. Syllogisms Sf 0 a0 ee) ah Pe 
16. W 23° oh Be Oh 303° -25') 208753 AOL | 2. OL a 
46. Word Number oh 3” @ ww 0 oO 1 7 213. 25 03 03 215 
4 iggnercene ae 03 Oh 03 22 14 12 05 7 19 19 02 08 -22 
ae 3a 16. 38 25 00: 05. ah 38. 37> 63 5) DE. 07 
49. Word Recognition 6 . a ne an Se “Ee ys 
50. Figure Recognition 12 45 ett 03 68 “13 08 08 16 17 00 Ob 
51. Picture Recall 05 03 03 oe 27 21 13 1 2 © 32 2 
62. Theme ke oh ol 05 2 2 Ol OD AB 
- ‘ patie 21 27 21 05 1 03 0 O1 08 05 -10 52 10 
55. Sound Grouping 32.30 29 21 Mh 2 1h 09 06 1 08 55 -16 
56. Spelling — Ol 32 49 1h 09 19 29 ~03 33 03 16 19 08 
57. Grammar 13 29 54 18 10 00 32 03 22 OF 22 33 08 
50. Vocabulary (Chicago) 09 92 20 09 -17 09 #02 915 12 OF -10 00 UL 
59. Word Count 05 18 02 2h 03 52 1k 05 OF -O7 -14 02 09 
60. Vocabulary 12 Gc 8B. 0 2 2 dao 32 BS eR SR 06. 338 
Ss 793 634 418 4o2 261 279 217 216 191 177 174 170 9 





T,, = (A{A,) 'A{A, . Or, if rotating A, into A, : A,T., = A, , then similarly 
T., = (AjA,)"'A{A,. And, it is readily seen that 7,. = 73} = T%, , since the 
transformations are orthogonal.] Since the ordering of the rows and columns 
of these matrices is arbitrary, they are presented so that their traces are 
maximized—thus ordering as nearly as possible the comparable factors for 
the two solutions represented. 
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This juggling is not so straightforward as might be imagined, because of 
the possibility of reflecting factors. In order to emphasize comparisons of 
major factors—while keeping the traces maximized—minor factors have 
sometimes been reflected (as indicated by minus signs in Table 2). 

These comparisons have not been extended to two further studies of 
PMA—those of Holzinger and Harman [4] and of Eysenck [2]. These latter 
studies proceed a priori in an attempt to generate a general factor plus more 
or less mutually exclusive group factors, and thus necessarily are distinctly 
different from the four studies compared here—all of which have pretensions 
to what is usually called simple structure. Also, neither Holzinger and Harman 
nor Eysenck used the same common-factor space as Thurstone (they started 
afresh from the correlation matrix) and quantitative comparisons of the sort 
exhibited in Table 2 are thus not available. 

Upon studying the matrices in Table 2, it does not seem unreasonable 
to assert that Thurstone, Zimmerman, quartimax, and varimax are, in 
general, manifestations of the same viewpoint for explaining the nature of 
the psychological universe sampled by Thurstone’s 57 variables—these 
matrices all seem reasonably close to being diagonal where major factors are 
being compared. More particularly, it is interesting that the correlations 
between number factors defined by the different rotational solutions generally 
are higher than other correlations in Table 2. This implies that a number 
factor is well established—or, at least, that there is a great deal of unanimity 
in its definition. This observation is consistent with the more extensive 
results of French [3] and Ahmavaara [1], both of whom in monographs 
attempting to chart the cognitive domain repeatedly found a number factor 
most strongly defined. Similarly, but to a lesser extent, there is unanimity 
among the four solutions in defining spatial, verbal, (word) fluency, deduction, 
and memory factors. Additionally, Zimmerman’s visualization factor [10] runs 
clearly through the four solutions. On the other hand, there seems only 
equivocal consensus in defining factors within the perceptual and within 
the induction and reasoning realms. 

It has not been suggested that the varimax solution be considered an 
approximation to simple structure, nor has an attempt been made to establish 
the validity of this solution by its relationship to the other solutions; the 
above comparisons are intended merely to show relations among the several 
solutions. 

Whatever ephemeral value may obtain from the comparisons, any 
permanent merit the present varimax solution may have will lie in the first 
section of this paper—in Table 1 not Table 2. More generally, it would appear 
desirable, in the light of the rise of analytic criteria for rotation, to cease 
paying obeisance to a notion (simple structure) which operationally, ap- 
parently of necessity, must remain a subjective, scientifically unpalatable, 
art. The rationale of the varimax criterion is based ultimately on the more 
fundamental psychometric concept of factorial invariance [5]; thus, it would 
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seem that the present varimax solution must be judged finally on its psycho- 
logical meaningfulness in its own right and not on its relationship to sub- 
jective simple structure solutions. 
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A COGNITIVE PROBABILITY MODEL FOR LEARNING 


JoHN E. OvERALL* 
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JOHNS HOPKINS UNIVERSITY 


_ A quantitative model for the behavior of albino rats in choice-making 
situations is presented. The model, which is based upon a cognitive con- 
ceptualization of the learning process, is shown to yield predictions which 
are equivalent to those produced by the linear operator stochastic models 
at the asymptotic limit but which differ from these during early trials in the 
learning situation. 


Consider a hypothetical cognitive rat motivated to find food in a simple 
T-maze. A cognitive creature, its behavior is mediated by a plan of action 
based upon past experience. A simple organism, it entertains within aware- 
ness only one perceptual organization at a time. The plan of action adopted 
on any trial is assumed to depend entirely upon the single stimulus-behavior- 
outcome perceptual organization active in awareness at a critical instant 
at the beginning of the trial. Stimuli in the T-maze are assumed to interact 
with the needs of the organism to activate a search mechanism which selects 
from memory storage the trace of a perceptual organization from past ex- 
perience, and the past experience which is selected by the search mechanism 
determines the nature of the plan of action adopted for the trial. Thus, the 
probability of a particular response on any trial is assumed to be equal to 
the probability that a memory trace appropriate to that response will be in 
awareness at the critical instant when the plan of action crystallizes. 

Experiences in a simple 7-maze are assumed to result most readily in 
perceptual organizations favorable to the formation of two distinct plans of 
action. The stimulus-behavior-outcome sequences (1) 7-maze—turn right— 
find food, and (2) 7-maze—turn left—find no food may be considered right- 
favorable, i.e., favorable to the development of a right-going plan of action. 
The stimulus-behavior-outcome sequences (3) 7-maze—turn left—find food, 
and (4) 7-maze—turn right—find no food may be considered left-favorable, 
i.e., favorable to the development of a left-going:plan of action. 

The total memory system is conceived to be compartmentalized with 

*This work was done while the author was a National Science Foundation Post- 
doctoral Fellow in the Psychometric Laboratory of the University of North Carolina. 
The author is indebted to Dr. Lyle V. Jones, Director of the Paychometri Laboratory, 


for considerable time and effort in reading and criticizing early drafts of the manuscript. 
The help of Mr. Douglas K. Spiegel in criticizing an early draft is also gratefully acknowl- 
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individual storage compartments identified according to motivational and 
stimulus values. All memory traces with similar motivational-stimulus 
connotations are deposited together in an appropriately situated storage 
compartment. For example, in this discussion attention is directed toward 
the storage compartment in which are deposited traces appropriate to the 
hunger drive in a J-maze situation. Within each storage compartment 
traces are deposited chronologically, with recent deposits laid over less recent 
ones. At all times the search mechanism is assumed to move to the storage 


eX 4 compartment with the highest motivational-stimulus value. Upon arriving 


at the proper storage compartment, the search mechanism randomly selects 
a single stimulus-behavior-outcome trace which it then brings into conscious 
awareness. Because of the chronology of deposits within each storage com- 
partment, the probability of the search mechanism selecting a recently 
deposited trace is assumed to be greater than the probability of its selecting 
a less recent one. If the probability of recalling each successively less recent 
event is assumed to decay by a constant proportion k upon the occurrence of 
each new event, the total probability of recall can be written as the sum of 
the independent probabilities of recalling each of the n mutually exclusive 
traces deposited within the memory compartment. 


(1) a + ak _ ak’ oh oo + ak" == 1, a4 


where a is the probability of recalling the most recently deposited trace 
and ak" is the probability of recalling the least recently deposited trace. 

The independence of probabilities of recalling success. ely less recent 
events depends upon the conceptualization of the memory ¢cal] mechanism 
and in no way involves assumptions concerning the indewen: nee of trials 
in the learning situation. 

Since the sum of all terms on the left of (1) equals unity, a and k are not 
free to vary independently for any given value of n. In terms of the con- 
ceptual model, k is assumed to be primary in its representation of the decay 
in probability of recalling successively less recent events; hence a, represent- 
ing the likelihood of recalling the most recent event, is assumed to depend 
upon the relative availability for recall of the less recent events. For any 
given number of deposits n, the relative likelihood of recalling the most 
recent event a, is determined by the rate of decay, or reduction in availability 
of less recent traces. If k is a constant fraction between 0 and 1, and n is the 
number of traces available, the value of a determined by (1) is 


a 1 sy n-1 + 
(2) “9 Srer a LE 
Examination of (1) reveals that the value of a approaches 1 — k as n 


approaches infinity. This characteristic will prove important when discussing 
implications of the model with respect to trial to trial changes and average 
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levels of response probability. For all n less than infinity, a, equals 1 — 
a,k/d,-, . This yields predictions of trial to trial changes in response prob- 
ability which differ in a significant respect from predictions based upon 
linear operator models, as will be discussed in a later section. 


Predicting the Probability of Response for a Single Rat 


Consider a hypothetical cognitive rat with the past experience of ten 
previous trials in a simple T7-maze. Suppose that the ten trials have left the 
following stimulus-behavior-outcome traces, ordered chronologically from 
most recent to least recent. 


Recency Stimulus Behavior Outcome 
1 T-maze Run left Find food 
ye 2 T-maze Run right Find food 
pee : 3 T-maze Run right Find food 
"tae + T-maze Run right Find no food 
5 T-maze Run left Find ‘no food 
+ 6 T-maze Run right Find no food 
— ae 7 T-maze Run left Find food 
ES dle 8 T-maze Run left = Find no food 
agit’ 9 T-maze  Runright Find food 
“s er wt 10 T-maze Run right Find food 


The probability that the search mechanism will select a trace which 
will result in the plan of action “run right in search of food” is the sum of 
the independent probabilities associated with recalling each of the mutually 
exclusive right-favorable events deposited within memory storage. The 
terms appearing on the left-hand side of (1) can be written as elements in a 
vector of probabilities of recall » = [a, ak, ak’, --- , ak"~"]. An experience 
vector can be written for the individual subject with elements 0 and 1 repre- 
senting the left-favorable and the right-favorable experiences, respectively, 
e = [0, 1, 1, --- , 1]. The probability that a subject will follow a right-going 
plan of action on any particular trial is the inner product of the experience 
vector and the vector of probabilities of recall, 


(3) Prati = erp’. 


A special case of the general model is the two-alternative choice situation 
where the probability of response to one alternative is equal to one minus 
the probability of response to the other alternative, i.e., Pzj = 1 — Prati 
Let the left-favorable and right-favorable experience vectors constitute a 
two-row experience matrix with general element f;,; , where f is the probabilit 
that the event of trial 7 was favorable to alternative 7. ic kackaael came 
that, in the two-alternative case, each event in the experimental situation 
is favorable to one and only one of the alternatives. Multiplication of the 
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experience matrix E by the vector of probabilities of recall u’ results in a 
vector of response probabilities p/ ,., , 


(4) E-p’ = Dini : 


For the general multiple-alternative case, an m X n experience matrix 
is written, with m equal to the number of response alternatives and n equal 
to the number of experiences or events. The general element f,,; is the prob- 
ability that the jth event was perceived as favorable to the 7th alternative. 
As illustrated above, each f;,; element is either 0 or 1 for situations where 
the correction method is employed; however, when the noncorrection pro- 
cedure is employed the elements can take on intermediate values. In all 
cases, the sum of the elements in any column of the experience matrix must 
equal unity, i.e., the events of each trial are assumed to be favorable to some 
alternative. For an m-alternative situation, the probabilities of response 
can be calculated as follows. 




















Predicting the Proportion of a Group of Rats Choosing Each Alternative 


If a large group of rats were given ¢ trials in a simple 7T-maze with 
probability of reward on the right 7, and the probability of reward on the 
left x, , the procedure outlined above could be used to predict the probability 
with which each rat would choose each alternative on each trial. The pro- 
portion of the group of rats expected to turn right on any trial would then 
be the mean of these individual probabilities. However, the proportion of a 
large group of rats expected to choose each alternative on any trial n + 1 
may be calculated directly by constructing an experience matrix 2, with 
elements f;,; representing the proportion of the group experiencing an event 
favorable to alternative 7 on trial 7. As above, the columns of the experience 
matrix represent discrete trials, ordered according to relative recency, with 
the first column representing the most recent trial. A vector of probabilities 
of recall @ can be constructed from the terms appearing on the left-hand 
side of (1), with elements ordered from largest to smallest corresponding 
to the chronological ordering of memory deposits from most recent to least 
recent. The elements in this case represent average probabilities of recalling 
successively less recent events since individual organisms probably differ 
with respect to rate of decay of memory traces. The proportion of the group 
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of subjects predicted to choose each alternative on trial n + 1 is the product 
of the experience matrix and the vector of probabilities of recall, 


(5) Ej’ rag Bins: 


For example, the proportions of a large group of rats predicted to choose 
the right and left alternatives in a simple 7-maze might be calculated: 


a a nay Fes tt 4 ia ae 
Te | mer ‘ue Fis Fis ak Prin+i 
ak"? 


_ak”* | 








where fz, is the proportion of the group observed to (a) turn right and find 
food or (b) turn left and find no food on trial n;f,,, is the proportion of the 
group observed to (a) turn left and find food or (b) turn right and find no 
food on trial n. 

Predictions of the proportion of a large group which will choose each 
of m alternatives on trial n + 1 are calculated in exactly the same manner. 
An experience matrix has m rows and n columns. Each element f,,; is the 
proportion of the large group of rats which experienced an event favorable 
to alternative z on trial 7. 


Trial to Trial Changes in Response Probabilities 


For an individual subject in a simple T-maze, the probability of ex- 
periencing a right-favorable event on trial n is the sum of two probabilities, 
that of turning right. and finding food, and that of turning left and finding 
no food. Since the probability of turning left is equal to one minus the prob- 
ability of turning right, the probability of experiencing a right-favorable 
event can be written 


(6) fen = Priw'™ + Pun(l — ™), 
fron = Prin’ ™ + (1 ra Prd(1 — TT), 


where fz,, is the probability of the event of trial n being perceived as right- 
favorable, pe,, is the probability of response to the right-hand alternative 
on trial n, and 7, and 7, are probabilities of reward on the right and left, 
respectively. 

It has been assumed that the probability of a right-going response is 
equal to the sum of the independent probabilities of recalling right-favorable 
experiences, and that the probability of recalling each specific experience 
decreases by a fraction upon the occurrence of each new event. Therefore, 
any previous sum-probabilily, such as that associated with recalling a right- 
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favorable event on trial n, will decay by the same proportion upon the occur- 
rence of the new event on trial n + 1. Thus, if the amount of decay in the 
previous response probabilities can be determined, it will be possible to 
calculate the probabilities of response directly from the response probabilities 
on the preceding trial. From the basic model, the probability of recalling the 
most recent experience is a, on trial n + 1. Since the total response probability 
on trial n + 1 is unity, the increment resulting from probability of recalling 
the most recent event must be exactly balanced by a compensating decrease 
in probability of recalling less recent events. For the simple two-alternative 
case, this relationship is 


(7) “ =a penn, + Po.n(1 poe 2 + “a 
n An- <3 
P2,n+1 Po.n°T2 + Pi n(1 — ™) . P2, 

Each of the vectors in (7) is assumed to sum-to unity, i.e., the sum of 
probabilities of response to alternative 1 and 2 on trial n, the sum of prob- 
abilities that the event of trial n was perceived asfavorable.to either alterna- 
tive 1 or 2, and the sum of probabilities of response to the two alternatives 


on trial n + 1. From this assumption it follows that a, equals 1 — (a,k/a,-,). 
However, this relationship between a and k follows directly from (1) if k is 


assumed invariant. Since a approaches 1 — k as n approaches infinity, the 
| ratio a,/a,-, approaches unity and drops out of (7) as n approaches infinity. 


Because the two elements in each vector are linearly dependent, solution for 
trial to trial changes in response probability for the two-alternative case 
involves only a single scalar equation. In the general m-alternative case, 
there are m — 1 independent probabilities. 

For the simple two-alternative case, the change in probabilities of re- 
sponse resulting from a single trial can be determined by subtracting the 
probabilities of response on trial n + 1 from the corresponding probabilities 
on trial n, 


(8) bi ies ape + P2,(1 — a ee. (1 oe at) Pe) 

Apo, Po.n'T2 + Pill — mm) =" P2,n 
A similar equation can be written for the general m-alternative case, where 
each vector contains m elements. 

Comparison of the above result, developed from the cognitive probability 
model, with results obtained by Estes [3] and Bush and Mosteller [2] 
from contiguity-association and S-R reinforcement assumptions, shows 
definite similarities. Although the present work was not motivated by a 
conscious attempt to duplicate the results discussed by these writers, and in 
fact the similarities were not at first recognized [5], similarity to the “gain- 


loss form” of the linear operator model [2] is especially striking. A contri- 
bution of the present model is perhaps found in the conceptual origin of the 
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constants and in the relationship between the gain and loss coefficients as 
expressed in (1). 

While in the two-alternative case, failure of reward following response 
to one alternative results in a perceptual organization favorable to the other 
alternative, in an m-alternative (noncorrection) experiment, failure of reward 
following response to one alternative leaves uncertainiy as to which of the 
remaining alternatives is perceived as favorable. In an m-alternative situa- 
tion a perceptual organization favorable to one and only one alternative on 
each trial is realized in one of two distinct ways. The subject may choose 
alternative A, and be rewarded for his choice. In this case a perceptual 
organization favorable to future response to alternative A, results. On the 
other hand, the subject may choose alternative A, and fail to be rewarded. 
In this case the perceptual organization will not favor alternative A,. Instead 
the cognition might be verbalized in the following manner: “A, was not 
correct; I should have chosen A; (different from A,).’’ The search mechanism 
is activated by the unsuccessful outcome to bring forth an experience favorable 
to one of the other alternatives, and the A; perceived as favorable is de- 
termined by the trace made available by the search mechanism. The resulting 
perceptual organization is favorable to response to alternative A; if recalled 
on a future trial. The probability that an unsuccessful response will be per- 
ceived as favorable to any particular A, is equal to the ratio of the probability 
of recalling a trace favorable to that particular A; to the probability of 
recalling a trace favorable to any alternative other than the one chosen 
without reward on that trial. Failure of reward following response to alterna- 
tive A; will result in a trace favorable to alternative A; with probability 
Din/(1 — pj), Where p;,, is the probability of recalling a trace favorable 
to A; on trial n, and p,,, is the probability of recalling a trace favorable to 
A; on trial n. 

In the simple two-alternative case, the probability that the event of 
trial n will be perceived as favorable to each of the altérnatives is 


hws = Pin’ ™ + re hae 72), 
(6a) res 
Pp sn 


2 
I sie Pi,a(l ™). 


ee — P2,n°T2 + 

For the three-alternative case, the vector giving probabilities that the 

event of trial n will be perceived as favorable to each of the three alternatives 
is 


hive = Pi,n’™ + pee a “D3 .n(l Ps Ts); 


Pad 2) + 
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fon = Da.nFa + 7p all — mi) + Py. — 1). 
ln 2,n 


For the four-alternative case, the probabilities that the event of trial n 
will be perceived as favorable to each of the four alternatives are 


ads _ Pin’ ™ + Toy Pall — T2) 


+ 1 Pe Daal _— T3) + eee cat - 4) 3 


fen = Pon'%2 + Ta Pil — ™) 


nm 





+ Ps,» ~Pa.n(L — ™); 


1 — Pan ee 


gas Pi nll os ™) 





(6c) fan = Ds.n°%3 + 


+ ge Pall — ™) + ene te — 7); 
2 Rae n 


Di.n Py n(1 a ™) 


fa.n = Pa.n°™ + oe 


+ —Ps.e__.», nl a T2) + —Pse iy. (1 = 13). 


L—-f.° 1 — Da.n 
These illustrations can be generalized to the m-alternative case. Note 
that the number of terms on the right-hand side of an equation is the number 
of response alternatives. The first term is the probability of response to the 
particular alternative in question multiplied by the probability of reward 
associated with that response; the remaining terms involve a fraction of the 
probability of response to one of the other alternatives multiplied by the 
probability of not being rewarded. The vector 


te = [fim fam *°° fen)! 


forms one column in the experience matrix E presented in (4). 


Asymplotic Level of Response Probability 


Consider a cognitive rat placed in a 7-maze situation with reward 
determined randomly with probabilities 7, and zw, at the right and left, 
respectively. (It is not necessary to assume that 7, + m7, = 1.) From (8), 
the probabilities of response will change from trial to trial as a result of 
successive experiences. What is the average level around which response 
probabilities will be expected to fluctuate under various probabilities of 
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reinforcement at the two alternatives? The simplest answer to this question 


is that the probabilities of response will be expected to approach the level | ? 


at which the change from trial to trial is zero. Since the response probabilities 
to the two alternatives are linearly dependent, it is necessary to consider the 
probability of response to only one alternative. Setting the trial to trial 
change in response probability, as defined in (8), equal to zero, the following 
is obtained (substituting p.,, = 1 — pi ,,). 


a 





n-1 


Oy [Pi .n°™ + (1 tre Pi.nd(l — *2)] — ps as a 


But after many trials, a. — 1 — k, so that 

Qe[Pi,0°%, + (1 — ~pi,0)(1 — m2)] — AnD,» = O. 
Factoring a. , expanding, and rearranging terms, 

Pi,0°%, +1 — Pi,o — T2 + Pi,0'%2 — Pi,w = O, 


and 
D—i,o(%, +m. — 2) +1—7, = 0. 


Solving for the expected probability of response to alternative 1, 


1 — 7, 
(9) a ae aah, oo a 

The result presented in (9) is considered appropriate to the behavior of 
large groups of subjects, as well as individual animals. In the one case, 7,0 
is the average probability of a single rat choosing A, after many trials, and 
the value may be obtained as the proportion of several trials on which the 
alternative was chosen. In the other case, p,,. = f:,0 , the proportion of a 
large group of rats which is expected to turn right after many trials in the 
situation. Once more, it should be noted that the result is identical to that 
presented by Estes [3] and Bush and Mosteller [2] for asymptotic response 
probability. 

A similar solution for asymptotic levels of response probabilities is 
conceivable for the general m-alternative case; however, since there are 
m — 1 independent equations, a simultaneous solution is required. This is 
done rather easily for any arbitrarily chosen values of 7, , 72, °-* , ™™ and 
has been undertaken by the writer for the three-alternative case. Asymptotic 
levels obtained in this way do not correspond to those obtained by employing 
the Bush and Mosteller model. It appears that the predicted asymptotic 
values are more extreme, approaching the 1.00 and 0 probability levels more 
closely than would be predicted from the linear operator model. 


‘The Learning Curve 


Suppose that a hypothetical cognitive rat is placed in a simple T-maze 
situation for the first time. Reward is always present on the right (7, = 1.00) 
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and never present on the left (7. = 0). Equation (9) suggests that after many 
trials the average level of response probability to the right-hand alternative 
will approach 1.00. Consider now the manner in which response probability 
approaches this final level. 

Only those events which occur within the experimental situation have 
been considered in discussing the determination of responses. In prediction 
of actual behavior it becomes necessary to consider generalized past ex- 
periences which may involve events occurring outside the experimental 
situation, but which nevertheless are relevant in formulating a plan of action 
in the experimental situation. A simplifying assumption is that past ex- 
periences in searching for food and in turning right and left in the home cage 
have filled the rats memory to capacity with generalized traces appropriate 
to the formulation of a plan of action in the T-maze. This is similar to assum- 
ing an infinitely large number of relevant past experiences, and this results 
in a = 1 — k from equation (1). It may be further assumed that the past 
experiences of the subject have left it unbiased with respect to right-favorable 
and left-favorable traces, i.e., on trial 1, p,,, = po,, = .50. Because food is 
always on the right and never on the left, each experience in the 7-maze 
will result in a right-favorable trace. Probabilities of response following 
successive trials can be written: 


following triall, p,.. = a+ p,.1°k; 
following trial2, p,,; =a+ak+p,.°k; 
following trial3, p,,, = a + ak + ak? + p,,,-k’; 
and so on, until, 
following trialn, pins. =a+ak+ak?+--- + ak"'+p,,-k’. 


The process continues with the probability of recalling a specific right- 
favorable event in the 7’-maze increasing after each trial and the probability 
of recalling a generalized past experience (equal to p,,, on trial 1) decreasing 
after each trial. As the exponent on the final term approaches ~, the prob- 
ability of recalling a generalized past experience approaches zero and the 
probability of response to the right approaches unity. 

For the two-alternative case, where a = 1 — k and where reward is 
always located at one alternative, the probability of response to alternative 
j on trial n + 1 can be expressed as 


n—-1 
(10) a hm ke + kp = Diner + 
i=0 
The first term in (10) represents the sum of probabilities of recalling 
specific events occurring in the experimental situation while the second term 
represents the sum (implicit) of probabilities of recalling generalized past 
experiences favorable to alternative j. It would be possible, and in the present 
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development advantageous, to distinguish between the probability of recalling 
the most recent experience a and the probabilities of recalling all less recent 
experiences occurring in the experimental situation 

n—-1 n—-1 

a>) ki —-a=a Dk. 

i=0 t=1 
The total probability of recalling an event favorable to the alternative j 
is the sum of three terms—the probability of recalling the most recently 
deposited trace, the sum of the probabilities of recalling all other experi- 
ences from the experimental situation, and the probability of recalling a 
j-favorable trace from generalized past experiences. In this case, 


n-1 
(11) ata Dk + kp. = pian - 


t=1 
Note that it is reasonable to combine the second and third terms into a 
single term representing the sum of all probabilities associated with recalling 
j-favorable traces prior to the most recently deposited one; this is equivalent 
to the probability of response to alternative j on the trial n, multiplied by k, 


n-1l 
(12) a > ke + K*-93.. = k-piin 
* i=] 


Thus the trial to trial changes in response probability can be written 
in terms of the response probability on the preceding trial plus the change in 


that response probability resulting from the events of the most recent trial, 


(13) a+ K-Dj.n = Pinti- 


In the general two-alternative situation, reward may not always be 
presented only at one alternative; hence it becomes necessary to consider 
the probability that the most recent event was perceived as favorable to the 
jth alternative, the probabilities f;,, defined in (6), (6a), (6b), and (6c). In 
the two-alternative case, the following formulation is most general, where 
a=1-k, 


(14) qe + Pa.a(l — a * P| - lags 


P2,n°T2 + Pi,n(l — ™) L Don P2,n+1 
or 


(14a) of it. HP - om 
2, Po, D2 n+1 


For the m-alternative case, where a = 1 — k, 





1 Pin Pi.n+i 
(14b) fon + k Pon _ Pe.n+1 . 


m,n. m,n, m,n+1 
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The special case a = 1 — k reduces the model to what is essentially 
linear operator form, but this involves very special assumptions concerning 
the unlimited past experience of the organism. From a purely formal point 
of view, note that the functional relation of a and k, as expressed in (1), 
introduces another degree of freedom into the cognitive probability model. 
Whenever the number of available past experiences cannot be assumed to 
approach infinity, the value of a, = 1 — (a,k/a,_,). Relaxing the restriction 
that a = 1 — k, the general expression for trial to trial changes in response 


probabilities becomes 
| Pin Di ,n+1 
a, 
4. 


(15) a, fe,n | -k Pe.n —_ Peo,n+1 . 








nm—1 


1 Fras LPm,n m,n+1 


That a exceeds 1 — k during early trials in the experimental situation 

has important consequences for trial to trial predictions of response prob- 

-<* ability. Depending upon the number of available traces assumed at the 
initial experimental trial, a, will be larger than 1 — k, from (2), and the change 
in response probability resulting from each trial will be greater than would 
be predicted if a = 1 — k. Where some events are favorable to one alterna- 
tive and some to another, trial to trial fluctuations predicted by the present 
model will be greater than those predicted from the linear operator models; 
yet as n increases the model retains advantages of the linear operator models 


Waned in predicting asymptotic levels of response probability. 
pr oe 
Ore cS 4 Discussion and Conclusions 
ya a "It may be viewed encouraging that models grounded in diverse con-\_ - 
»,~°  eeptualizations concerning the nature of the learning process should produce ' ~ ‘ 
similar results. (Simon [8] has demonstrated that similar results for asymptotic . 


behavior can be derived on the assumption that the subject is behaving in a 
purposeful and rational manner and attempting to minimax his regret.) 
However, although several probability models produce similar results under 
certain conditions, distinctions between them should not be minimized at 
the present stage of theoretical development. 

Treating the total probability of recall as the sum of independent prob- 
abilities of recalling discrete perceptual organizations emphasizes the need 
to investigate factors affecting the availability of memory traces. Reduction 
in probability of recalling specific traces as a function of the interference 
produced by intervening events is considered; however, other factors which 
have been demonstrated to affect probability of recall in experimental investi- 
gations of memory might well be incorporated into the “memory function” 
if empirical results in the context of probability learning suggest their impor- 
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tance. Thus, the model provides a framework for incorporating a considerable 
body of research which has hitherto remained isolated from the more general 
theories of learning. 

Whereas most contemporary theories of learning emphasize either 
frequency or recency as the sole principle of association, the cognitive prob- 
ability model emphasizes the interaction of these two factors. Although the 
values of a and 1 — k may be assumed to be equal under certain conditions, 
and in consequence several results of the present model can be equated to 
those of the linear operator models, the fact that predictions from the present 
model actually depend upon two interrelated coefficients should be recognized. 
A troublesome problem for the linear operator models is that the constant 
or constants which produce the best fit to a smoothed learning curve fre- 
quently fail to provide adequate trial to trial variation in response probability 
during early trials. For treating this problem, a ‘two-coefficient’? model 
appears to offer certain advantages. If the organism is assumed to enter the 
learning situation with fewer than an infinite number of relevant past ex- 
periences, then a > 1 — k and the present model affords greater trial to trial 
variation during early trials while retaining certain advantages of the linear 
operator models in predicting average levels of response probability following 
many trials. 

A perceptual theory of learning provides flexibility in choice of learning 
units which Bush and Mosteller ([2], p. 189) infer to be essential in providing 
maximal utility to a quantitative model. “The utility of the general model 
depends upon both the formal structure of the mathematical system and the 
appropriateness of the identifications made. It could turn out that the general 
model might be quite adequate with one set of identifications but wholly 
inadequate with another. For example, it is conceivable that in a two-choice 
situation, say choosing ‘right’ or ‘left’, some people would behave as if the 
response were right or left, whereas for others ‘same’ or ‘opposite’ of previous 
trial would be the appropriate identifications of responses.” It is difficult 
to see how adherence to a strict S-R association or reinforcement theory can 
provide this kind of flexibility. Conception of the learning unit as a complex 
S-S perceptual organization permits the cognitive model to deal successfully 
with behavior which appears to defy explanation in terms of simple associa- 
tion theories. For example, a rat can learn to alternate responses between 
two well-spaced levers in a Skinner-type apparatus [9]. Response to a lever, 
or reinforcement following response to a lever, is observed to reduce the 
probability of responding to the same lever on the next trial. This behavior 
can be predicted by recognizing S-S perceptual organizations which include 
relationships between trials; increased is the probability of the subject doing 
what it perceived itself to be doing on the previous successful triai [6]. 

Generalization of any current probability learning model to human 
behavior should be undertaken with caution, since human behavior in 








172 PSYCHOMETRIKA 


learning experiments fails to conform to the “positive recency” prediction 
of the probability learning theories [1, 7]. The cognitive probability model 
presented here also involves a positive recency principle operating on the 
perceptual organizations of the organism on past trials. A key to rationalizing 
the obvious differences between the behavior of rats and human subjects 
in simple two-alternative choice situations may be found in the nature of 
the perceptual organizations assumed to be active in the different species. 
Although it is frequently necessary to consider intertrial relationships in 
accounting for human behavior, the choice behavior of rats can more often 
be accounted for in terms of intratrial perceptual organizations. 
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AN ANALYSIS OF GUTTMAN’S SIMPLEX* 


Puiutie H. DuBots 
WASHINGTON UNIVERSITY 


Applying a Spearman formula for factor loadings to a variant of the 
diagonal method, the Guttman simplex model is factored algebraically into 
n/2 additive factors. The finding that communalities can be discovered such 
that the rank of a simplex becomes n/2 is contradictory to Guttman’s con- 
tention that the minimal rank is n — 2. Certain matrices of 4 and 5 variables 
presented by Guttman as simplexes, can, in general, be considered 2-factor 
matrices, easily analyzed to simple structure without rotation. One example 
of 6 variables is factored by the method described to a 3-factor structure. 


When any two variables, 7 and j, are represented on just a single common 
factor, g. , three consequences may be noted: 

(i) the correlation between elements of a certain vector of coefficients 
involving variable 7 (a vector designated vector 7) and the corresponding 
elements of vector j is 1.00; 

(ii) the ratio of any element in vector z to the corresponding element in 
vector j is a constant (this is the criterion of equiproportionality); 

(iii) estimates of the communality of variable 7 based on solving second- 
order minors involving 7 and j and a third correlated variable are invariant, 
as are the estimates of the communality of variable 7. Thus, Spearman’s 
formula, h? = r;;ri./7;, , yields constant values when k is taken as each of 
the other (n — 2) variables in turn. 


For variables 7, j, --- , » let the uncorrelated common factors with 
unit variances be g, , gs , ‘°° » Jm , in which the variables are weighted a, 
b, --- , m, respectively. Let w be the unique variance (including error) of 
any variable, with weight w. Then, 

(1) Zi = Aga + WU; , 
(2) 2; = Aga + Wu; , 
(3) Ze = Ugo + digs +--+ + IM Gm + Wit , 
(4) Z: = Aga + digg + ++: + Migm + Witt , 
(5) Zn = Anda + dage + -°* + Magn + Welln - 


*Prepared under Contract 816(02) between the Office of Naval Research and Washing- 
ton University and presented in part at a meeting of the Southern Society for Philosophy 
and Psychology, St. Louis, Missouri, March 27, 1959. Permission is granted for reproduc- 
tion, translation, publication, use, and disposal in whole and in part by or for the United 
States Government. 
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Multiply (1) through (5) by (1), sum over cases, divide by N, drop out 
zero values, and cancel the unit variances used as multipliers. Then sub- 
stitute the communality aj for unity in the 7th position of the resulting vector. 
The result is the vector 7. Vector 7 is obtained in a similar manner by multiply- 
ing the same expressions by (2), then substituting a} for unity as the jth 
element. The two vectors are: 


ee sae as ae 
(6) Vector i a; G,0; GQ, QQ, *** Gdn 
Vector j a0; @; ;Q, QQ, *** Om 


It is seen that in vector 7 there is a constant multiplier, a; , and in vector 
j another constant multiplier, a; . Without the multipliers, the two series of 
terms are identical. Accordingly, the vectors correlate 1.00. It is also apparent 
that when any term in vector 7 is divided by the corresponding term in vector 
j the result is the ratio a;/a; . 

With empirical data the communalities a} and a} are unknown, but if 
equiproportionality of terms in the two vectors obtains the Spearman formula 
cited above holds in that 


(7) h? = Tii7a/Tin = 0;0;0;0,/0;0, = a? e 


Recently [4] it was pointed out that in this situation the correlation 
between any factorially complex variable k and the factor defined by variables 
i and j can be found from a form of the same Spearman formula as follows: 


[ret ja:a a; 
(8) Troe fies ik’ ik att F hed eek eed 3 =a. 
33 a;a; 


Correlations of 7 and j with g, are, of course, the square roots of the respective 
communalities. 

As noted in the same reference, each variable can be modified so that 
it becomes uncorrelated with g, by subtracting from each coefficient the 
product of the two pertinent factor loadings, one of which is taken as a 
covariance, the other as a beta. Resulting coefficients are of residual variables, 
i.e., differences between obtained values and values predicted by an ordinary 
regression equation. If diagonal elements are analyzed, the two factor load- 
ings will be identical. As a result of the partialing out process, all variances 
and covariances of defining variables, such as the coefficients represented 
in (6), reduce to zero. All other coefficients in the matrix reduce to partial 
variances and covariances of original variables with variability associated 
with the common factor removed. Actually it is not necessary to work with 
the diagonals. Final communalities can be found by summing squares of 
factor loadings. 
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Factoring Guttman’s Simplex 


Guttman’s simplex model, [1-3, 5-15], readily yields to complete algebraic 
factorization by the principles stated above. In a simplex, any correlation is 
considered to be the ratio of two saturations, the saturation of the simpler 
variable divided by the saturation of the more complex variable. The matrix 
may have any number of variables. In the equally spaced, perfect simplex, 
correlations adjacent to the principal diagonal are identical. In the unequally 
spaced, perfect simplex, correlations adjacent to the diagonal are the highest 
in the matrix but are not identical. Other r’s are lower the farther they are 
from the diagonal. The present development is applicable to both varieties 
of the perfect simplex. For variables 1, 2, ---, 8, in order of increasing com- 
plexity, the matrix is given in Table 1. 

In the matrix two pairs of vectors meet the three criteria indicating 
that they define a single factor: vectors 1 and 2, and vectors 7 and 8. In both 
cases the correlation between vectors is 1.00; pairs of elements from the two 
vectors are proportional; and communality estimates are invariant. 

The matrix can be analyzed starting at either end. The simpler variables 
will be used to start the analysis. By (7) the communality of variable 1 is 


(9) hi = Yitis _ (a; /a2)(a;/as) a a : 


the communality of variable 2 is 


(10) 2 ua T12!"e3 ms (a; /d2)(d2/as) = 1.00. 
13 a,/d3 


The loadings of variables 1 and 2 in the first factor, g, , are the square 
roots of these communalities. Factor loadings of the other variables are 
found from (8). (They may also be found, as in Thurstone’s diagonal method, 
merely by dividing r,, by Wh? or rj, by Wh? .) In Table 1 it happens, 
because of the communality of 1.00 for variable 2, that r.,, = 12x = @2/Q - 
The factor loadings in g, for the eight variables are given in Table 4. 

Table 2 displays the first factor residuals, or the partial covariances of 
variables after g, has been partialed out. All coefficients involving variable 
1 or variable 2 are, as would be expected, precisely .00. Thus 


(11) = rr 7, (1.00) = .00, 
(12) ao _ 2% _ 0 
A; =p Ag por a 
and so on down to 
(13) 2 (1.00) = .00. 
& tt 
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TABLE 1 


Formulas for the Correlation Coefficients in a Simplex 














Variables 
1 2 3 4 5 6 7 8 
1 a,/as a,/az a,/ay a)/as, a,/ag a,/87 a,/ag 
2 a,/as an/ay a,/a, an/ag afar a,/ag 





























3 a,/ay ax/as, a,/a¢ a,/ao a3/ag 
4 ay/as a,/a¢ a,/a7 a,,/ag 
5 a,/a¢ a,/aq a,/ag 
the” 6 a,/a a-/a 
fw4 6/*7 6/*8 
‘ 
jee 7 a7/ag 
sw? 
3 
i—_ 
me TABLE 2 
Matrix of First Factor Residuals (Partial Covariances) 
Residual Variables 
Ss 2.8, 3-8, 4.8, 5-8, 6.8, 7-6, 8.g, 
1 Bs 00 oc 00 00 00 00 00 
2 Bs 00 00 -00 00 00 00 
a% - aB = a8 - a8 a8 - a8 aZ-aB a8 - a8 
3-6, 252, azas a3a6 a3a7 a3zag 
4 af - 85 af - a3 az - a af - a8 
“Bs ayes ay ag ayaz ayag 
a® - a3 a& - a3 ab - a3 
os 9+Ba 2586 a587 a5ag 
ae 
2 eyes 
hei aB- 8 86 ~ 83 
a 687 aga 
az ie aZ 
2 
7-8, 
a7ag 





As an example of finding one of the partial covariances differing from 
zero, consider the computation of C3,.,, . 


Cs40 = Cy — BsoeC'so. = 134 — 1390 4a 
(14) aa 





The matrix of partial covariances in Table 2 is a new simplex. Again, 
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the higher coefficients are concentrated adjacent to the principal diagonal, 
and there is systematic decline in both dimensions away from that diagonal. 
Again, the first two vectors meet the criteria for defining a single factor, as 
do the final two vectors. Vectors 3 and 4 are used in defining these vectors, 
although vectors 7 and 8 might have been used instead. 


TABLE 3 


Matrix of Non-Zero Second Factor Residuals (Partial Covariances) 









































Variables 
6-8,8, 7-848, 8-848, 
ag - aff ag -a ag -a 
9-8.8,, = a a 
56 art 5 8 
fica ag - aj ag - aj 
a~b 3627 3628 
az - af 
7-88 ana 
78 
TABLE 4 
Loadings of an 8-Variable Simplex in 4 Common Factors 
Variables Factors Communalities 
Ba Bp Be Bq hf 
1 a,/a, -00 -00 -00 at/as 
2 1.00 00 -00 00 1.00 
ag - a8 a ag - a8)? 
3 ap/as a ” = = { 2 - 
az Vay - a5 a5 a5 (aj - a5) 
4 an/ay ¥ ‘5 as ag -00 -00 1.00 
: eae a2 (a2 - a2)? 
a es ols ae? 4 Ges ok 
3 a2/as, aq Yah - a5 00 a2 2(n2 2 
3 2 a5 Vaz - af 5 85(8G * 9) 
6 a/e ee ae 00 1.00 
6 a6 ay a5 a a6 ay 
2 . 
x f a,/a Ly 2 2 Ly 2 2 “7 26 %6 (af as a6)" 
| aS eS US ar ‘ag <—e ag az(ag a a2) 





1 1 
: "Me ag "8h 82 ag YG = 8 8g Y8B - 86 ie 
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By applying (7) which, together with (8), is applicable to partial co- 
variances of any order, the communalities are 


[(@_— a3)/aga,}[(as — a3)/a,a5] _ (a3 — a3)” 





2 — 
(5) Pee (a, — a3)/a.as a3(ay — a2)’ 
and 
(16) y2 = Las = a)/asa)[(4 — a2)/a4as] _ a4 — a2, 
“te at 





(as = a2)/ A345 as 


Factor loadings of the defining variables are computed by extracting the 
square roots of the known communalities. 

Factor loadings of the nondefining variables are computed by (8), 
using 3.g, and 4.g, as defining variables. Thus, 


T(S+9a)00 = i — az)/a35][(as — a3)/d4as] 


(as — a2)/dsQ4 


as = a> 1 2 2 
= en ae em Va _ 2° 
as as 


Second-factor loadings are reported in Table 4. All partial covariances 
in vectors 3 and 4 become zero when the second factor is partialed out. 
For example, 








(17) 





2 


a; — a; _ (a3 — a3) Vai — a; 


2 
A304 aa, Vae—a@ 








Costes saad 
(18) 
a-—-a a-—a 


= -_ 2 = 00 
A304 304 aw 





Partial covariances in other vectors are as shown in Table 3. To illustrate 
the computation of C7zs.4.55 ; 


e-a vVa—-aGvVGa-a 


a7Ag az ag 








Crs.0008 


(19) 
2 2 2 2 2 
oy, a; — A> Q4 — Ag Pa Qz— & 


a7Ag 7A, A7Qg 





The nonzero second-factor residuals displayed in Table 3 form a simplex 
in which vectors 5 and 6 define a third factor with loadings as reported in 
Table 4. The loadings of residual variables 7 and 8 on a fourth factor can be 
found either by extrapolation of the pattern of loadings previously observed 
or from a simplex with more than eight variables. 

If n is the number of variables composing a simplex, then n/2 is the 
number of conventional additive factors needed to reproduce it. The fractional 
remainder can be disregarded, since any three variables can be taken to 
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define a factor and with an odd value of n, three variables will remain at the 
end. The factor loadings of these last three residual variables conform to the 
pattern of loadings of factors already extracted. By extrapolation, factor 
loadings for a simplex of any number of variables can readily be found. 

Since only n/2 factors are required to reconstitute a simplex of n variables, 
it is apparent that the rank of a simplex, with communalities in the diagonal, 
is n/2 rather than (n — 2) as asserted by Guttman [14]. Communalities of 
all variables, obtained by summing the squares of the factor loadings, are 
also shown in Table 4. Their pattern, with values of 1.00 for even-numbered 
variables, makes it unlikely that a rigorously perfect simplex will ever be 
discovered with real data. 


The Pseudo-Simplex 


Not all matrices that superficially look like a simplex can be considered 
as such. Consider, for example, the pseudo-simplex exhibited in Table 5. 
Like the simplex, its highest correlations are along the diagonal. Correlations 
decrease regularly, the farther they are from the diagonal. It also appears 
that the first two vectors correlate perfectly, as do the last two. 

However, if one applies the simplex formula for r, r;; = a;/a; , trouble 
develops. Both a,/a, and a,/a3 are .55. These values, when multiplied, yield 
an inferred value of .3025 for a,a3 or 7,3 . However, the matrix value of ri3 
is .45. It is apparent that some method other than dividing “saturations” is 
involved in the development of the pseudo-simplex. . 

Both the simplex and the pseudo-simplex are developed from a single 
“saturation” for each variable, but the simplex correlations are ratios of 


TABLE 5 


A Pseudo-Simplex for k = .65 

















Saturation Variables 
a, 1 2 ? 3 4 a 6 
1 -6 “55 45 -35 +25 +15 
2 5 +55 +55 45 +35 +25 
3 4 45 +55 55 45 35 
4 3 +35 45 +55 -55 45 
5 2 25 +35 45 55 55 
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saturations, while the pseudo-simplex involves subtractive functions, together 
with a constant k, which for this example has been chosen to be .65. The 
formula for each entry is (k — a; + a;). Factorially, the pseudo-simplex 
seems to be more complex than the Guttman simplex. While the first two 
vectors correlate perfectly, they do not meet the criterion of equiproportion- 
ality of corresponding elements or the criterion of consistency of communality 
estimates of variables 1 and 2. Vectors 1 and 2 do not define a single common 
factor that can be extracted throughout the matrix. 


An Empirical Simplex 


Up through five variables, the examples of simplexes given by Guttman 
are in the conventional sense 2-factor matrices. Each can be analyzed by the 
variant of the diagonal method of factoring described above. Simple structure 
is reached directly without rotation, since the first two variables are taken as 
loaded exclusively in the first factor, the remaining variables in both factors. 
The degree to which empirical 5-variable matrices depart from this basic 
2-factor pattern is the degree to which they depart from the simplex. 

A factor analysis of a 6-variable simplex reported by Guttman [7] is 
shown in Table 6. First-factor loadings were computed by the use of (7) 
and (8). In the case of the loadings for variables a and 6}, correlations with 


TABLE 6 
Analysis of a Simplex Abstracted by Guttman [7] from Thurstone (16] 








Zero Order Matrix 
Thurstone First Factor 











Variables No. Loadings b c da e f 
a Dot Counting I 14 -T43 -699 .503 .202 .088 .010 
b Dot Counting III 16 -9ho -598 .272 .153 .053 
c Dot Counting II 15 -_ 455 .302 .236 
a Pursuit 4y 4357.3 
e Mazes I 35 13 -684 
f Mazes II 36 -02 
Matrix of First Order Partial Covariances 
Second Factor 
Variables loadings b c a e f Communality 
a -000 -.001 -016 ot -.015 -.011 -552 
b -000 -.020 -00 -022 +02 .887 
c . 387 .211 -21 -580 
a - 700 .398 Ze -569 
e +555 6 -327* 
f 556 .310* 
Matrix of Second Order Partial Covariances 
a e g 
e +000 -.00 -00 
| "008-008 
e 373 
N = 710 


*Plus square of loading in a third factor indicated by doublet between variables 
eandf. If this factor is assigned equally to the two variables, total commu- 
nality of e is .700 and of f .683. 
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variables c and d were used (and communality estimates averaged), but 
correlations with variables e and f were disregarded. When the first factor 
was partialed out, there were no partial covariances in vectors a and 6 in 
the resultant matrix greater than .03. Accordingly, vectors a and b were 
disregarded in computing second factor loadings, again by (7) and (8), with 
communality estimates averaged in finding the loadings for variables c and d. 
In the matrix of second order partial covariances, vectors involving 
c and d have no elements greater than .01. It is apparent that two factors 
can account for all the common variance in variables a, b, c, and d, but that 
there is a third factor in variables e and f, represented by the second order 
partial covariance of .373. Since no unique factor loadings can be found for a 
doublet, it is taken merely as indicating the presence of a third factor. 
Communalities are also given in Table 6. As expected, the communality 
of variable b approaches unity. On the other hand, the communality of 
variable d does not follow the pattern of a true simplex. The rank of three 
for the matrix as a whole is exactly what analysis of a perfect simplex would 
indicate. There is no evidence of a rank of four as demanded by Guttman. 
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DETERMINANTAL METHODS IN LATENT CLASS ANALYSIS 


ALBERT MADANSKY* 
THE RAND CORPORATION 


Some extensions of the existing determinantal methods for solving the 
accounting equations in latent class analysis are presented. These extensions 
cover more cases than previous methods, give rise to new sufficient conditions 
for identifiability of the latent class model, and give insight into the necessity 
of various sufficient conditions for identifiability. These implications to the 
identifiability problem are discussed. 


Suppose we have a set of K questions, each one answerable only by 
yes or no. Let x; be the probability that a person chosen at random from 
the population of interest (either finite or infinite) answers yes to question 
7, let ,; be the probability that an individual chosen at random from the 
population answers yes to questions 7 and j, and in general let z, be the 
probability that an individual chosen at random from the population answers 
yes to all the questions indexed by members of o, where ¢ is a subset of the 
integers (1, 2, --- , K). 

Assume that the population is composed of m disjoint classes, wherein 
the answers to the K questions are independent within each class. Call 
these latent classes. Let v, be the probability that a person chosen at random 
from the population be a member of the ath latent class, where >>"., v, = 1. 
Let \% be the probability that an individual chosen at random from the 
ath class answers yes to all the questions indexed by members of «. By the 
assumption of independence of answers to questions within each latent 
class, AZ = []esee X42 , Where the members of o are ordered in some way, 
and o; is the 7th member of o. Then 


T, = S piNS me OP 2d ae 


a=l a=l oieo 
These equations are usually called the accounting equations, 


*This work, part of a doctoral dissertation submitted to the Department of Statistics 
of the University of Chicago, was made possible by The RAND Corporation and by a 
see to the Department of Statistics of the Uatveenty: of Chicago by the Rockefeller 
‘oundation, I gratefully acknowledge the comments of ‘T. W. Anderson, W. H. Kruskal, 
and D. L. Wallace and especially the comments of L., A. Goodman on this work. 
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Lazarsfeld and Dudman [5] and Koopmans [3] independently solved 
the accounting equations in the latent class model for all the latent param- 
eters when (K + 1)/2 > m. Anderson [1] and Gibson [2] simplified this 
solution so that it involved only matrix manipulations. The conditions which 
they assume for their solution to exist and be unique are, then, sufficient 
conditions for identifiability. By identifiability of the latent parameters, 
we mean that for each set of manifest parameters, i.e., z,’s, there corresponds 
a unique set of latent parameters such that the two sets of parameters are 
related by the accounting equations. Clearly existence of a one-to-one corre- 
spondence between manifest and latent parameters is necessary in order to 
be able to describe the manifest situation uniquely by a latent structure 
(cf. [7] for further discussion of this concept). 

Gibson also extended the Anderson solution in two ways when 
(K + 1)/2 > m to make use of more of the equations. In the first extension, 
he increased the number of equations used, but reduced by K the number of 
latent parameters determined in a single iteration of his procedure, though 
with several iterations all the latent parameters can be determined. In his 
second extension he gave another method of increasing the number of equa- 
tions, which, if not used in conjunction with his first extension, can be used 
to determine all the latent parameters at once. Since Anderson’s solution 
may require as many as K — 2m + 2 iterations to obtain all the parameters 
and Gibson’s first extension requires more than one iteration, use of Gibson’s 
second éxtension can save appreciable work. 

Gibson’s second extension of Anderson’s procedure for (K + 1)/2 > m 
(i.e., for K — 2m + 2 > 1) will be presented below. Also, the Anderson 
procedure will be generalized to the case where 2“~”? > m > (K + 1)/2 
and the necessity of both our and Anderson’s and Gibson’s sufficient conditions 
will be investigated. In our generalization Lazarsfeld’s [4] ascending matrices 
will be employed, rather than the basic matrices (defined by Lazarsfeld [4]) 
which Anderson used. These matrices are defined later; a basic matrix is a 
special type of ascending matrix, so that Anderson’s procedure will become 
a special case of ours. 

Lazarsfeld ({4], p. 23) gave, essentially, the Anderson solution of the 
accounting equations without stating the conditions under which the solution 
exists and is unique. He then applied this solution to the located class model, 
where (m — 1)/r = d is a positive integer and d < K/2. (r is the degree 
of the polynomial 4% = A%(x) in the located class model.) By this procedure 
Lazarsfeld implicitly obtained a generalization of Anderson’s solution of the 
latent structure equations, in that his method of solution of the accounting 
equations of the located class model can be carried over to the latent class 
model where m > (K + 1)/2. However, he did not apply this extension of 
the Anderson solution to the latent class model, nor did he state the con- 
ditions under which the generalization works in the located class model. Our 
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extension of Anderson’s solution is also a generalization of Lazarsfeld’s 
extension as applied to the latent class model, as will be seen later. We did 
not, however, see [4] until after we had obtained, in more general form, the 
extension of Anderson’s solution given there. 


Sufficient Conditions for Identifiabtlity 


The notation used is that of [1]. We shall determine the »’s and )’s from 
the x’s constructively. To illustrate the procedure, assume that m = 2°"? 
K odd, and let n = (K — 1)/2. Define 


? 


1 TW, °° * TWh M12 °° * Wa-1,n Ti23 °° * Wn-2,n-1,n °° * Ti2- 


Tnat+1,n+2 


teu 
T* = WK-2,K-1 ; ’ 


Tr+1,n+2,n+3 


WK-3,K-2,K-1 








oe > Sa 6 O86 Oe 4 2 SC O'S 6.0.0 6,64. 0 OS # 6 910 0.0 6 850050 6 4) 6 40.8 8 


LWn+1,n+2,°¢*,K—1 4 


where the (z, j)th w has as a subscript the combined subscripts of the (7, 1)th 
and (1, j)th 7. Define II as II* with each element of II* bearing the added 
subscript K, and the (1, 1)th element of II being 7x. Note that II and II* 
have 


nN n NR) _ on _ 9 (K-1)/2 _ 
1+ ()+ Qt Q)-e-2 


“7 2 Sete 2 Oe ot eee 
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rows and columns. Define 








1 1 eee 1 
1 2 m 
Ant An+1 apts An+t 
1 2 m 
AK-1 Ax-1 ‘ coe Kg 
1 1 2 2 m m 
An+1An+2 An+1An+2 fear An+1An+2 
1 1 2 2 m m 
A; = | Nx-2AK-1 AK-2\K-1 te+) Nx-2AK-1 
1 1 1 2 2 2 m m m 
| An+1Ans2Ans3 An+1An+2An+3 st 34 AntiAn+2An+3 
1 1 1 2 2 2 m m m 
Ax-s\K-2AK-1 Ax-sAK-2\K-1 aid AK-3AK-2AK-1 
1 1 1 2 2 2 m m m 
LAn+iAn+2 te AK-1 An+1An+2 pao) Nx-1 tia & An+1An+2 peaks AK-1_ 


Define A, as A, with first row unchanged and with n subtracted from each 
subscript of the elements in the other rows. The rule of formation of A, 
and A, based on the first column and row of II* is evident. Define 


- poe 


| Vy ‘ad 4 
| 5 ed O 

N=| wh and A = . 

eo. O 


m 
= Vn a: Ax- 











A, is of size 2°“? X m = m X m, Ap is Of size 2°" K m = m X m, N 
is of size m X m, and A is of size m X m. If the latent structure model holds, 
then it can be easily verified that TI* = A,NAjand II = A\NA Aj. 

The solution of these equations is exactly the same as Anderson’s pro- 
cedure given for the similar equations in [1]. As in [1], it is required that 
A, , A, , and N be nonsingular, that no two diagonal elements of A be equal, 
and that the classes be indexed so that Ax < Ag << +++ <A. 

A remark should be made about the indexing of the 7’s which make 
up II* and II. Call the subscript K which is added to the members of II* 
to form II the stratifier. From [1], the stratifier should be chosen so that 
Ag ¥ Af for all (a, 8), a ¥ B. We also partitioned the remaining K — 1 indices 
into two parts, arbitrarily called 1, --- ,n = (K — 1)/2andn+1,::-, 
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K — 1. This indexing need not be the same as the indexing of the items of 
the questionnaire, say. From the discussion above, it is seen that the par- 
tition should be chosen such that the corresponding A, and A, be nonsingular. 
There may be more than one stratification and partition which satisfy the 
above requirements. By algebraic consistency, use of any one of them will 
lead to the same set of v’s and ’s. We thus have the following theorem. 


Turoreo 1. If m = 2“-”” and there exists some stratification and par- 
tition such that |A,| ¥ 0, |A,| ¥ 0, Az ¥ XZ for all (a, B), a ¥ B, and if |N| ¥ 0, 
then the latent class model is identifiable. 


Since only cases where m is known are to be considered, it is assumed that all 
v, > 0, so that |N| ¥ 0 by convention. 

Lazarsfeld defines ascending matrices as those having the form II and 
II* and defines a basic matrix as the upper left-hand (n + 1) X (n + 1) 
corner of II* (or any principal sub-matrix thereof). Anderson proved Theorem 
1 for basic matrices, and hence he makes the restriction (K + 1)/2 > m. 

Lazarsfeld defines the level of a z as the number of items in its subscript, 
and says ((4], p. 16) that “in the DC [discrete latent class] case we can, of 
course, also form ascending matrices which have (c — 1) entries on each 
level, c being the number of latent classes.” It is to ascending matrices of 
this type that Lazarsfeld refers when he gives a solution to the located class 
model. Note that our ascending matrices are not of Lazarsfeld’s form since 
ascending matrices with (m — 1) entries on each level are not used. Lazarsfeld 
puts this restriction on the ascending matrices since he uses them only in 
the located class model. This restriction is not necessary in the latent class 
model, as has been shown. 

If 2°*-”” > m> (K + 1)/2, then a modification of Anderson’s pro- 
cedure will determine the )’s and »’s. This modification consists of constructing 
fi*, a matrix similar to II*, by including [(K — 1)/2] z’s in the first row and 
[K/2] ’s in the first column, and then completing the first row and column 
by adding 2nd order, 3rd order, etc., 7’s (in the manner in which they are 
added to II* above, when m = 2‘*~””*) until the first row and column are 
of length m, where [2] is the greatest integer value in x. [It is not really 
necessary to complete fi* by using 7’s with lower order subscripts in prefer- 
ence to z’s with higher order subscripts. Any completion such that A, and 
A, (to be defined later) are nonsingular will do. We merely state the above 
manner of completion of fi* to give a definite procedure and because it is 
conventional to use this procedure.] 

The rest of the elements of fi* are determined in the same manner as 
are the rest of the elements of II*, and ff is determined by using the Kth 
item as a stratifier. Since m < 2“~”” and the maximum number of elements 
in the first row and column of the matrices constructed is 2“~"”’, both 
fi and fi* are m X m matrices. Also, fl and fl* can be expressed as 


—--— + os ee 
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products of N, A, and appropriately constructed matrices of the same form 
as A, and A, above, say A, and A, . 

Define n = (K — 1)/2, and, as before, let the subscripts of the )’s 
in a given row of A, correspond to the subscripts of the r in that row and in 
the first column of [I*, and let the subscripts of the )’s in a given row of 
A, correspond to the subscripts of the 7 in that column and in the first row 
of fi*. Then it is evident that fi* = A,NA{ and ff = A,NAQJ. 

Since, in the case where m < 2°“~””, ff and [l* can be factored in 
the same manner as II and II* above so that these equations can be solved 
for A, , A, , and N by Anderson’s procedure, we have Theorem 2. 


Tueorem 2. If m < 2“~”” and there exists some stratification, partition, 
and completion such that |A,| ¥ 0, |A,| ¥ 0 and \x ¥ AK, a & B, then the 
latent class model is identifiable. 


For the special case m = 2‘“~””*, K even, we can make use of all x’s 
in determining the v’s and ’s in the following way. Take n = (K/2) — 1. 
Then partition the z’s to form [1* by using z, , --- , z, in the first row and 
Tn+1y °** » Tx-» in the first column. Fill in the first row with 2nd order, 3rd 
order, etc., x’s with subscripts the various combinations of 1, --- , n. Fill in 
the first column similarly, and then determine the elements of the rest of 
II* as done above for I*. Then fl* will have 


ee bj ee to Be ef (™) = = 28 = m 
I 2 n 


rows and m columns. Only (K — 2)/2 + (K — 2)/2 = K — 2 7z,’s are 
accounted for. We can, then, use both items K — 1 and K as stratifiers and 
obtain two different [I’s, IIx_, and Mx , say. Using the above factorization 


method on [* and IIx_, determine A, , N, A, , and Ak_, , «+ , Ag- j using 
it on II* and IIx determine \j , --- , Ag (and once again A, , N, and A,), 


where A, and A, are constructed from [I* in the same manner as A, and A; 
were constructed based on II* above. If the model holds, then by algebraic 
consistency A, , N, and A, as determined by each stratification will be identical, 
except for a possible permutation of some columns, as will be seen later. 

This leads to a general method of solving the latent structure equations 
using more z’s than the procedure associated with Theorem 2 when m = 2", 
n < (K — 1)/2. Let I1* be determined by the stratification 7, , --- , 1% ; 
Tn+15 °** 2m), and define II; as II* with j as the added subscript (j = 2n + 1, 
+++ , K). There are K — 2n determinantal equations |fI; — @ fI*| = 0, 
where each does not involve all the z’s but in toto they involve all the z’s 
and whereby one can determine A, , N, and A, K — 2n times (if they are 
nonsingular and AS ¥ A? , a ~ Bforeachj = 2n+1,---,K)anddj,---, 
x; once for each j = 2n + 1, ---, K. 

It is necessary to determine A, , N, and A, more than once not only as a 
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computational check and to reduce rounding error by averaging but also 
because of the following considerations. The 6’s are ordered by magnitude in 
determining A, , N, and A, . It may be that two different stratifiers will yield 
two different orderings of the latent classes. What one does in this case is 
determine A, , N, and A, for each of the stratifications and then permute 
columns of one of these matrices, say A, , determined from the second strati- 
fication until it is identical to that determined from the first stratification. 
The columns of the other matrices, including A, based on the second strati- 
fication are then permuted in the same manner as A, was. A, and N will then 
be identical with A, and N of the first stratification and the A of the second 
stratification will have its elements in the order of the latent classes induced 
by the first stratification. 

This method is summarized by the set of sufficient conditions for iden- 
tifiability of the latent class parameters given in the following theorem. 


THEOREM 3. If m = 2", n < (K — 1)/2 and there exists a subset of items 
{j;7 = 2n +1, +--+ , K} where dS ¥ Nb, a ¥ B, for all such j and there exisis 
a partition (1, --- ,n;n-+ 1, +++ , 2n) of the remaining items such that the 
corresponding A, and A, are nonsingular, then the latent class model is identifiable. 


Note that by this procedure we use more than the number of z’s required 
by the procedure connected with Theorem 2. However, this procedure does 
not concomitantly yield a method of estimating the v’s and )’s via estimates 
of the z’s as the other procedures did. The reason for this is that due to 
sampling error the estimates of the A, based on different stratifications will 
not have the property of being identical except for permutations of columns. 
This procedure is stated because juxtaposition of the procedures of Theorems 
2 and 3 will give some insight into the necessity of these sufficient conditions 
for identifiability. 

The following theorem, of interest in itself as a sufficient condition for 
identifiability, is also of interest in that it will lend some insight into the 
necessity of various sufficient conditions for identifiability. 


TuEoreM 4, Let s be an item with X% = X° where a ¥ B,a, B= 1,--+, 4 
t> 2. Letd\S ¥W. fora ¥ B,a,8 = 1, -::,t, all s’ ¥ s. Then if (1) there 
exists a partition for this stratifier, s, such that |A{?| # 0, |A{?| # 0, where 
AX? and AL are the A, and A, matrices for this first stratification, and (2) 
there exists another stratifier, s*, such that (a) for the above t classes \,2 ¥ \*. 
for a ¥ B, a, 8 = 1,2, +++, t, and for ¢’ other classes X,. = No, y, 6 = t + 1, 
--- ,t + U, and (b) there exists a partition for this stratification such that (i) 
the first t columns of A{” and A§” are the same as the first t columns of the Af? 
and A‘ in condition (1), and (ii) these two matrices, A{? and A$” are non- 
singular, then the latent class medel is identifiable. 


Proor. By assumption (1) of the theorem, the \%’s are just the character- 
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istic roots of the matrix II*~' II defined by the partition for which |A{"”| ¥ 0, 
|AS”| # 0. They are ordered by having the first ¢ classes correspond to the ¢ 
roots which are equal and the rest of the classes ordered by magnitude of 
the rest of the roots, as before. Then we have the relation 


W* XD, = (As”)* A’ As” XD, = XD; A”, 


where X is a matrix of characteristic vectors for the characteristic roots, the 
elements of A" based on the first stratification, and D, is of the form 


A. ® 
mest FE 

where A is some (unknown) ¢ X é nonsingular matrix, and D is an (m — t) X 
(m — t) diagonal matrix. Then A;”’~’ = XD, , so Aj” = Dx'X~*. Now the 
elements of D~' can be determined from the first column of X~’. In fact, 
d‘' = 1/zx"', where x"’ is the element in the 7th row, first column of X~’, 
and d" is the element in the corresponding row of D~’, and where i > ¢t + 1. 
Hence the last m — t rows of Aj””’ are determined from D and X, but since 
A cannot be determined, the first ¢t rows of A;’”’ are not determined. 

If one considers post-multiplying II II*~' by its characteristic vectors 
corresponding to its characteristic roots (once again the elements of A“) one 
obtains by similar analysis as above the 


nn*" YF, = AM A (AS) "YF, = YF, A™. 


Therefore A‘? = YFy,andso Aj” = FY’, where Y is a matrix of character- 

istic vectors of II II*~* and Fy is a matrix like Dy above. Hence, as above, 
the last m — t rows of A{””’ may be determined, but not the first ¢ rows. 

Now by assumption (2) of the theorem (noticing that the first ¢ rows of 

{)"" A” are the first ¢ columns of A{”’, Aj”) and by the procedure used to 

find the last m — t rows of Aj’’, Aj”’, the first ¢ columns of A{”’, A{” are 


determined. Q.E.D. 
An obvious modification of the conditions of Theorem 4 will yield 
sufficient conditions for identifiability in more complicated cases, such as 


when there exists t, , t, such that 
(1) 2% =M, aFXB, a,B=1,2,°°:,t; 
(2) TN, y¥ HK 6, v¥,6=44+1,-" +b; 
(3) ALAN; 
(4) Mivtedn , ext Cf Cf Dt+44+1; 
(5) AHN, NAM, €D4b+bhtl. 


The latent parameters can also sometimes be determined by the follow- 
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ing procedure. Assume that for each stratifier there exists a partition such 
that |A,| ¥ 0, |A,| ¥ 0 for this partition. Then, except for making the corre- 
spondence between a given X and its latent class, determine all the \’s by 
solving K determinantal equations of the form |II — 6 II*| = 0. There are 
(m!)*—' different permutations of these \’s, given that the latent classes are 
ordered by the magnitude of the first of the K stratifiers. List all (m!)*~ 
permutations of the ’s. For each permutation form the 2 x m matrix ) of 
\*’s (with first row the vector (1, 1, --- , 1)). Now + = dy, where = is the 
2* x 1 vector of z,’s (with m = 1) and v is the m X 1 vector of unknown 
y,’s. If for all (m!)*~' permutations of the )’s, |\’\| ¥ 0, and if for only 
one permutation the elements of v = (A’d)~')’z satisfy the conditions )>, 
v, = landO < », < 1, then the model is identifiable, the v.,’s are determined, 
and the correspondence of the \’s and the latent classes is determined. Call 
this procedure the permutation procedure. 

If (K + 1)/2 > m > 1, at least one m X m matrix II* and at most 
K — 2m + 2 II’s based on the remaining K — 2m + 2 items must be formed 
and at most K — 2m + 2 determinantal equations must be solved via 
Anderson’s procedure for A, , A, , and N. Henceforth consider the extreme 
case, where there is only one partition such that A, and A, are nonsingular 
and that \%° ¥ \%8, a ¥ B fori = 2m — 1, --- , K, ie., for each of the re- 
maining K — 2m + 2 items. Gibson suggests that, instead of using Anderson’s 
procedure K — 2m + 2 times, II* be augmented by adding rows whose first 
elements will be the 7;’s which are not included in the original II* to form a 


new matrix [I*, and one stratifying item be saved to determine [fl from fi* 
in the same manner as II is determined from II*. ff and fi* will now be 
(K — m+ 1) X m matrices. Write f* = A,NAS , ff = A,NAAJ , where 
the sizes of A, N, and A, remain the same but where A, isa (K — m+ 1) X m 
matrix determined from the rows of [I in the same fashion as A, is determined 
from the rows of II in Theorem 1. 

Consider the determinantal equation 


| (m*’ft*) "fil — ef | = 0 
(which reduces to |II*~* II — 6Z| = 0, which is equivalent to |II — @ II*| = 0 
when Il and II* are of size m X m). Then (6,, --- , 0,) isjust (Ag, -*** , AR), 


where K is the stratifier, if |A,| * 0 and |A/A,| ¥ 0. Also, by the procedure 
given by Anderson, A, can be determined. Finally, since 


ita; ta A,NA . = A,N, 
and since the first row of A, is (1, 1, --- , 1), the diagonal elements of N are 
just the first row of f*A,~', and A, = M*A,~'N~*. In the extreme case 
under consideration there are two sets of sufficient conditions for identifi- 


ability when m < (K + 1)/2. 
I. If there exists a partition such that |A,| # 0, |A,| # 0, and there exists 
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K — 2m + 2 items (j = 2m — 1, --- , K) such that \% ¥ b , a ¥ B, for each 
of these items, then the latent class model is identifiable (Anderson). 

II. If there exists a partition such that |A,| ¥ 0 and |A{A,| ¥ 0, and if 
there exists a stratification such that \x ~ dk, a ¥ B, for this stratification, 
then the latent class model is identifiable (Gibson). 


Necessary Conditions for Identifiability 


It is natural to ask, in view of the sufficient conditions for identifiability 
given above, whether any of them are necessary. In each of two cases, 
m < (K + 1)/2and (K + 1)/2 < m = 2 < 2%-”” there are two sufficient 
conditions for identifiability. Examples exist where one of the sufficient 
conditions holds (and hence the parameters are identifiable) and the other 
sufficient condition is not satisfied (and hence it is not necessary). 

First consider the Anderson and Gibson conditions for K = 4, m = 2, 
Anderson’s conditions reduce to 


MMM, NEN, NER, AMA. 
Now let A} = 3. Let 
ee! 
A, =|]\5 Mal, n=l ‘ and a= | " 
MX? A OM 0 XM 
Then 
| ArA, | = (As — Aa)” + (As)*(AS)” + (A2)”[(As)” — 25A3] > 0 


when \; > 2d; . Hence the model is identifiable by Gibson’s condition, but 
Anderson’s conditions are not met. Therefore, Anderson’s conditions are not 
necessary for (K + 1)/2 > m. 

Now assume that Anderson’s conditions hold. Then A, , being A, aug- 
mented by more rows, is of full rank, m. But if A, is of full rank, then A‘A, 
is positive definite, hence nonsingular. Therefore, Anderson’s condition for 
identifiability implies Gibson’s condition. 

_ One can see that Gibson’s condition for identifiability is not necessary 
by considering the following example in the light of the procedure used in 
proving Theorem 4. Let K = 6, m = 3. Let 


N= ATMA, AKA HN, NHeNUNKHNU, 
“MS = APE AS, Ap = AS PERS, AG = AS MAS. 


Then no item can be used as a stratifier in Gibson’s procedure. Now let 
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(oe ee a a es 
= 2 3 Ys 3 A? = da : 2 7 
MoM OM MOM 
1M a LM MAG 
Let 
‘1 1 14] (oo oe A 
AZ? = ]¥5 AE A], AD? = lA AD AL]. 
LAs Ns AGL LAs AS AG 














Let item 1 be the first stratifier and item 6 be the second stratifier. Assume 
that Aj”, A{” are nonsingular and that A{” and A{” are of full rank. These 
conditions can be met easily. Then by the procedure used in proving Theorem 
4, 


ai 0 
A” eae a : 
0 Ai 


and the last row of Aj” are determined by the first stratification, and 


A” a N F 


and the last row of A;?’ are determined by the second stratification. But 
now As’ and A{”’ are completely determined since the second rows of 
these matrices are just the elements of the two A’s. The order in which the 
\’s in the second row of A{” and A{” appear is determined from the order of 
the third row of Af”, A{”. But once Aj” is determined, N and A{" are easily 
determined, as shown above. Hence the model is identifiable but Gibson’s 
conditions are not met. 

Now examine Theorems 2 and 3 to determine whether or not their 
conditions are necessary for identifiability. Consider in detail an example 
where K = 7 and m = 4. Let 


M=M, M=M, M=HM, MHM, WHA, MHA, 


and A$ ¥ Ab, a ¥ B. Then the partition (1, 2, 5; 3, 4, 6; 7) can satisfy the con- 
ditions of Theorem 2 but not of Theorem 3, for 
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he canes NM-M A= 
1 1 3 4 

LiL eee Ce ee 
sdihelttetied HK Er. eee, 
uN NM 
(AS — As)[\rAz — ATAZ + AL(A2 — Aa) + AAT — Ad)] 0, 
for instance, if 4} # AZ, A} = AZ = 0, and A}, AR, AY, Ap ¥ O (and similarly 
for |A,|). Hence the conditions in Theorem 3 are not necessary for identifi- 
ability. 

Now assume that 4% = 4% = A$ = A4 for all a, that A4 ¥ M4 fora ¥ B 
for all j, and that AS # A; , for j = 5, 6, 7, 7’ = 1, 2, 3, 4, for alla. Then the 
partition (1, 5; 2, 6) with stratifiers (3; 4; 7) satisfies the conditions of Theorem 
3, but, since in the procedure of Theorem 2 there is only one stratifier used, 
one of the matrices A, , A, will have at least two of its rows composed of the 
\’s for questions 1, 2, 3, or 4. Hence one of the matrices A, , A, will be singular 
and so the conditions of Theorem 2 are not necessary for identifiability. 

It is to be expected that none of the above discussed conditions is neces- 
sary for identifiability, for the corresponding procedures did not involve all 
the z’s. However, the procedure corresponding to Theorem 1 does involve 
all the z’s, so that one might hope that the conditions of Theorem 1 are 
necessary as well as sufficient for identifiability. This conjecture is not true, 
as will be seen, but the following theorem lends credence to this conjecture. 











TueoreM 5. If K = 3, m = 2, then a necessary condition for identifiability 
is that there exist a stratification and partition such that |A,| # 0, |A.| ¥ 0, 
Ax ¥ AZ, where item K is the stratifier. 


Proor. If there exists no stratification such that \t ~ AZ , then all 
Aj = Aj, 80 m1 = AL, m2 = AZ, and ws = Ay, aNd my = 4172, TMs = TT , 
T.3 = 1%; , and m,3 = 714273 . Then there are three equations, 7; = j} 
(j = 1, 2, 3), and four unknowns, the )j’s and », , and this situation is un- 
identifiable. 

Similarly, if there exists no partition such that |A,| ¥ 0 (or |A,| ¥ 0), 
then for at least one of the j’s, \} = \j and a modification of the above argu- 
ment holds, with the exception that there may be as few as 3 equations in 
5 unknowns, and at most 4 equations in 5 unknowns. 

Thus if K = 3, m = 2[and m = 2““-”” = (K + 1)/280 that Anderson’s 
and our procedures are exactly the same], then the condition for identifiability 
of Theorem 1 (which in this case is also Anderson’s condition) is both necessary 
and sufficient. 

Although it is true for m = 2, K = 3, from Theorem 4 it is not true in 
general that if m = 2‘“~””, a necessary condition for identifiability is that 
there exist a stratifier, s, such that A% # ° , a ¥ B. 
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In view of Theorems 4 and 5, it is of interest to investigate under 
what circumstances it is a necessary condition for identiftability, when 
m = 2-»”* that there exist a stratifier s such that 4% ¥ A’, a ¥ 8. Trivially, 
if \* = »° for all a, B, a # 8B, for all s, then there are m + K — 1 latent 
parameters and the number of independent equations is reduced to K, 
namely those for m, , --- , wx ; hence this is always an unidentifiable situation. 
Assume that no stratifier exists such that 47 ~ \° , a ¥ B. Therefore, for 
each j there exists an a; , a} (a; # a;-), a; not necessarily equal to a;’ , for 
j ~ j’, such that Af’ = 7‘. Assume that for each s there are only two }’s, 
\f‘ and d¥"‘, which are equal. This case can be treated by a modification of 
Theorem 4, for in this case, in the notation of the theorem, t = 2 for all 
items. 

Theorem 4, as it stands, treats among other cases the situation where 
t > 2 for two items where for each item there exists a partition such that 
|A,| ¥ 0, |A.| + 0, and where, if we consider any other item as a candidate 
for a stratifier, we find that there is no partition such that |A,| # 0, |A,| ¥ 0. 
(We need ¢t > 2 for both items or else we could use Theorem 1 on this case.) 
In the case under consideration now, the procedure of Theorem 4 applies, 
with ¢ = ¢’ = 2, if the nonsingularity assumptions of the theorem are met. 
Hence, except for the case where m = 2 and K = 3 treated in Theorem 5, 
the case where 1 = 2 for all items is identifiable if the nonsingularity con- 
ditions of Theorem 4 are met. (When m = 2 and K = 3, the fact that ¢ = 2 
for all items implies \} = \¥ for all 7 = 1, 2, 3, i.e., that the model is unidenti- 
fiable. It is also easy to see that the nonsingularity conditions of Theorem 4 
are not met for K = 3, m = 2.) 

After the following considerations, the necessity of the other condition 
of Theorem 1, that there exist a stratification and partition such that A, 
and A, based on this stratification and partition are nonsingular, can also be 
examined. 

It can be shown [6] that ( =e 

(K — 1)/2 
which must be singular in order that for each stratification and partition the 
nonsingularity assumption of Theorem 1 does not hold. For example, if 
K = 5, m= 8, then the following are the possible stratifications and partitions. 


(1; 2,3; 4,5) (2; 1,3; 4,5) (3; 1,2; 4,5) (4; 2,3; 1,5) (5; 2,3; 1,4) 
(1; 2,4; 3,5) (2; 1,4; 3,5) (3; 2,4; 1,5) (4; 1,2; 3,5) (5; 2,4; 1,3) 
(1; 2,5; 3,4) (2; 1,5; 3,4) (3; 2,5; 1,4) (4; 2,5; 1,3) (5; 1,2; 3,4) 


) is the smallest number of A-matrices 


aoa K-1 ) es (3) Ak ; 
If the A-matrices based on the ( (K — 1)/2) = \2 6 sets of questions 


(2, 3), (2, 4), (2, 5), (4, 5), (8, 5), (3, 4) are all singular, then for each stratifi- 
cation and partition at least one of the A-matrices is singular. In [6] it is 
shown that this is the smallest number of sets of items for which this can 
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happen. If we can determine a different \4 from each of the gly : a 
singular A’s by solving each of the equations |A| = 0 for a different \4 , then 
the number of independent latent parameters is reduced to 
K-1 ) 
(K — 1)/2/ 
Now count the number of independent 7z’s in this situation. Due to a 
A being singular for each stratification and partition, there are 


an Hee 'e 


singular II*’s and the same number of singular I1’s, 


(K/2) Dale 3 


being the total number of stratifications and partitions. However, each of the 


K-1 
K(¢¢ ~'% m 
determinantal equations (where |II| = 0 and |II*| = 0) cannot be solved 
for a different z, for this will lead to redundancies and identities rather than 
solutions. 
In fact, since for a given partition and stratification |II*| = 0 implies 
|1I| = 0, the number of independent equations among these 


K-1 ) 
Kg —1)/2 
is reduced by one half. Knowledge of the singularity of any subset of the 
II*’s will yield no knowledge about the singularity of any other II*. This 
can be seen by considering that a given II* may be singular because either 
its corresponding A, or A; (or both) is singular and that the particular combi- 
nation A, , A, for that II* never appears in the factorization of any other 
II*. Hence, if in each equation |II*| = 0 one can solve for a different 7, 
the number of independent z’s is reduced by 
K-1 ) 
i ao — 1/2)" 
There is also one other z which does not appear in any II* and appears in all 
Il’s and, since |II| = 0, is expressible as a function of other (lower order) 
n’s, namely 7;..... . We thus have at least 


(2* — 1) — [ cca aX 0) + 1] 


mK + m—1-( 
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equations in at least 


ieee 


mK + m—1-(GF “0 


unknowns. 
Consider the following table. 


minimum number 








minimum number of of independent 
m K independent 2z’s latent parameters 
ea 3 5 
4 5 15 17 
ena 56 43 
16 9 195 89 


We have, then, the following result. 

Let m = 2, K = 3 or m = 4, K = 5. Consider the minimal set of A’s 
such that for each stratification and partition at least one of the matrices 
A, , A, is singular. If we can solve for the maximum number of )’s in the 
determinantal equations generated by the singularity of these A’s and for 
the maximum number of z’s in the determinantal equations generated by 
the singularity of all the II’s and II*’s, then this latent class model is 
unidentifiable. 

We need not have mentioned the case m = 2, K = 3 in this result, as 
that case is covered in Theorem 5. The significance of this result is that under 
mildly restrictive circumstances the nonsingularity conditions of Theorem 1 
are necessary for identifiability for K = 5, m = 4 (as was true for K = 3, 
m = 2) and even under these circumstances the nonsingularity condition is 
not necessarily necessary for K > 7. 

Since the conditions of Theorem 1, say, can hold without those of 
Theorem 4 or those underlying the use of the permutation procedure, it is 
evident that these latter are not necessary for identifiability. 

To summarize, (i) it has been shown by example that the conditions of 
Theorems 2 and 3 as well as a particular version of Gibson’s and Anderson’s 
conditions, are not necessary for identifiability; (ii) some cases have been 
discovered for which the sufficient conditions of Theorem 1 are necessary; 
and (iii) by just counting equations and unknowns one cannot come to any 
conclusions about the necessity of the conditions of Theorem 1 for K > 7. 
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A COMPUTER PROGRAM TO FIND THE BEST-FITTING 
ORTHOGONAL FACTORS FOR A GIVEN HYPOTHESIS 


Davip R. SAUNDERS 
EDUCATIONAL TESTING SERVICE 


A modification of the quartimax computation for factor rotation is 
described in which a hypothesized factor pattern is given to the machine along 
with the data. The machine uses the pattern to select the subset of variables 
to which it will attend when rotating in a given plane, in order to find an 
orthogonal solution which closely fits the hypothesis. The program also 
provides a measure of the goodness of this fit. The program can utilize pattern 
matrices that reflect only partial hypotheses as to the nature of the factors, 
as well as those that specify highly dicomsaed simple structure. 


The problem of rotation in factor analysis has been the focus of more 
than one controversial debate. Much of the argument can be boiled down 
to the fact that factor analysis is a very versatile tool, and different people 
have wanted it to serve different and sometimes conflicting ends. The advent 
of electronic computers, by making it possible to do factor analysis more 
quickly and cheaply, has aggravated this situation. While some have argued 
that computers should be used to do factor analysis better, rather than 


merely more quickly, it has been difficult to define what “‘better’”’ should mean. 

Rotation is one of the difficult cases in point. Since computers have 
become available, there have been no less than fifteen distinguishable ap- 
proaches to formulations of simple structure [1-15] that are more or less 
amenable to computer programming. While several of these have been shown 
to be algebraically equivalent under the restriction of orthogonality [5, 10], 
they are relatively diverse when applied to the oblique case. Carroll’s 
“biquartimin” formulation [2] even permits the machine operator to achieve 
any of an infinity of solutions by the simple choice of a constant that assigns 
relative weight to two of the simpler formulations. Charles F. Wrigley 
(private communication) and Gocka [5] have observed that even the simplest 
and most widely used of all of the analytic formulations—quartimax—is 
capable of yielding more than one numerically distinct solution from the 
same data, that may or may not evoke psychologically distinct interpre- 
tations, depending on the particular position of the axes at the input stage 
of the program. In view of these happenings, it seems clear that machine 
rotation has yet to provide the kind of objectivity that would be desirable; 
this objectivity is indeed demanded if a serious present criticism of psycho- 
logical factor analysis is to be met. 

In the quest for objectivity, machine programs for rotation have pushed 
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into second place what is for many practitioners the primary goal of rotation, 
namely, finding the “right answer.” It will typically be conceded by such 
persons that the machine programs do save at least 80 percent or even 90 
percent of the effort that they would previously have committed to a con- 
ventional rotation. In trivial problems almost any of the programs will save 
100 percent of the work, and in nontrivial problems almost any of the programs 
will probably yield acceptable results, i.e., results that can be meaningfully 
interpreted. However, the most plausible interpretation in such a situation 
will almost surely suggest some improvement in the placement of axes, the 
carrying out of which will introduce a deviation from the complete impartiality 
of the machine program. Since different machine formulations would probably 
not have yielded the same results in the first place, the introduction of such 
an element of subjectivity does not create any new difficulty. 

Until such time as there may be widespread acceptance of a given 
machine formulation of the rotational problem, rotation to simple structure— 
or according to any psychologically interpretable set of principles—is at 
least partly a matter of the experimenter’s subjective choice. Under these 
conditions, the ideal machine program will be one that allows the experimenter 
maximum freedom to express his choice and, at the same time, makes perfectly 
explicit the nature of this choice and its influence on the reported results. 
It is the purpose of this report to describe a machine procedure designed to 
incorporate these properties, and to indicate some of the ways in which it 
may be used. 


Logical Basis of the Program 


A machine program has been written as an elaboration of the familiar 
quartimax method, which is the simplest of the machine methods to program 
and which maintains orthogonal axes. (This program was written for 
the basic IBM 650 by Carl E. Helm and Ruth Bredon. Program details 
and duplicate program decks are available from the IBM 650 Program 
Library, File No. 5.1.007.) 

The elaboration provides for two distinct aspects of input. One of these 
is the usual factor matrix input, derived from empirical data and employing 
any convenient set of reference axes. The other is a pattern matrix of the 
same size as the factor matrix, but whose elements are simply 0’s and 1’s. 
The elements of the pattern matrix may be assigned in a variety of ways to 
reflect different possible hypotheses as to the over-all factor structure. The 
hypotheses may be based upon purely subjective considerations or found in 
a relatively objective manner using previous results. 

A precise statement of the computational role of the pattern matrix 
will be given below. An intuitive indication of its meaning may be gained 
from the rough generalization that 0’s in the pattern identify loadings that 
are expected to be negligible in magnitude after rotation and 1’s in the 
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pattern identify loadings that are expected to be appreciable, regardless of 
whether they are expected to have a positive or negative sign attached to 
them. This is only a rough generalization because it is necessary to assign 
a 0 or 1 for each and every loading even when it is uncertain whether to 
expect a small or large loading. If the pattern is conceived as defining a set of 
distinct hyperplanes by the location of its 0’s, then the uncertain loadings 
will probably be assigned 1’s. If the pattern is conceived as defining some or 
all of the factors through their high loadings, then the uncertain loadings 
will probably be assigned 0’s. Either conceptual approach may lead to a usable 
pattern matrix. However, the pattern must contain at least a minimum 
number of 0’s in order for the program to operate. 

The major elaboration in the program enables it to utilize two rules for 
the selection of variables to be considered in applying the quartimax criterion 
in each plane of possible rotation. Before carrying out any rotation in a given 
plane, the program compares the patterns that have been assigned to the 
two factors involved. If the patterns for the two factors are identical, the 
rotation is executed giving equal weight to each variable assigned a pair of 
0’s, and no weight to variables assigned a pair of 1’s. If the patterns are not 
identical, the rotation is executed giving equal weight to all variables for 
which the patterns differ, and no weight to variables assigned either a pair 
of 0’s or a pair of 1’s. 

It will be evident, for example, that if no pattern is introduced, or a 
pattern consisting solely of 0’s is introduced, the program will yield the 
familiar raw quartimax results. A pattern that is all 0’s is the appropriate 
way to represent the hypothesis of complete uncertainty as to the expected 
factor structure. (A pattern consisting solely of 1’s cannot be used because 
every variable in every plane will be found to have a pair of 1’s, the sum- 
mations required for the quartimax solution of the angle of rotation will 
always be over zero terms, and this will lead the machine to an indeterminate 
division of 0 by 0.) With the pattern composed of all 0’s, the patterns assigned 
for a pair of factors will always be identical and the summations will always 
include all of the variables. 

While the rules stated above define the basic logic of the program, and 
are sufficient to understand its operation, it should be noted for the sake of 
others who may wish to write similar programs for their machines that two 
additional and possibly unexpected subroutines are required. One of these 
insures that a pair of factors that have just been rotated are properly matched 
with the patterns they most resemble when returning them to memory. 
(No control need be exercised over the reflection of the factors.) The other 
allows for pattern modification after each of the first cycles of iteration, so 
that certain spurious solutions may be avoided; the importance of this 
provision will be indicated below. 

In order to provide measu ‘es of the quality of the structure obtained by 
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rotation, two sums are computed at the end of each cycle. One is the sum of 
the fourth powers of all the elements in the matrix, which is the function that 
quartimax tries to maximize; the introduction of any pattern will reduce this 
sum below its maximum. The other is the sum of squares of all elements 
whose pattern assignments are 0; comparison of this sum for patterns of 
equivalent complexity applied to the same data will reveal the pattern with 
which the data are more consistent. Since the sum of squares of all the loadings 
for any variables is a constant regardless of any orthogonal transformation, 
it may be seen that only variables whose patterns contain at least one 0 and 
at least one 1 can contribute to observed variations in this special sum of 
squares. It is anticipated that the theory of least squares should be developed 
to provide a direct statistical evaluation of the significance of this special 
sum of squares. 

Rotational methods with objectives that are relatively similar to those 
of the present program have been developed by Horst [6], Rodgers [11], and, 
in an unpublished paper, by Cattell. These methods are similar in that they 
provide for prior establishment of a hypothesis and endeavor to fit the 
rotational solution to the hypothesis. Horst’s method is relatively most 
similar; in the Horst method the hypothesis is specified in terms of a pattern 
matrix of 0’s and 1’s. Any pattern matrix acceptable to Horst’s method is 
expected to yield substantially the same results if used with our new program. 
The new program, on the other hand, is capable of utilizing a large class 
of patterns unacceptable to Horst’s method, but appropriate in the common 
situation when explicit hypotheses are available for only some of the factors. 
Examples of such situations will be found in the following section. 

Horst’s computational plan, example, and discussion implicitly assume 
the existence of a positive manifold type of simple structure. If a bipolar factor 
is expected Horst’s pattern matrix must contain a negative entry. No signs 
are employed with the pattern matrix in our program. 

It may be noted that in Horst’s method the pattern matrix becomes 
actively involved in the computations as a matrix multiplier. The pattern 
entries are essentially used as weights, even though they are less highly 
differentiated weights than those used by Cattell or Rodgers. In our program 
the pattern matrix is simply a convenient framework for storing information 
that is the substratum for certain specially devised logical operations; weight 
can be given to variables only when their pattern is found to contain a 0. 
In our pattern matrix the symbols 0 and 1 themselves are not really numbers, 
and are used only as a matter of convenience. Any two symbols, such as * 
and #, could serve equally well. 

In principle, we agree with a reviewer’s comment that varimax [7] would 
provide a better basis for this program than quartimax. In practice, however, 
the potential effect of using a pattern is much more substantial than the 
difference between varimax and quartimax. In more than one instance this 
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program has been applied to improve substantially even on results yielded 
by the normal varimax method. The decisive consideration in applying the 
pattern idea with the limited capacity of the IBM 650 is that quartimax 
requires fewer instructions than varimax. 

It will be seen that the logical formulation of the new program empha- 
sizes flexibility in choice of pattern—making a wide variety of solutions 
rapidly available by use of a single program—as well as clear separation 
between empirical and subjective inputs to the computations. 


Results to be Expected from Certain Patterns 


Suppose one wishes to rotate a factor matrix involving 5 factors and 10 
variables. Various patterns that might be used in connection with this problem 
are shown and identified in Table 1. 


TABLE 1 
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10-Variable, 5-Factor Matrix 
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Pattern A will yield the conventional quartimax result, and has been 
discussed. This pattern represents a hypothesis of complete ignorance of the 
expected factor structure after rotation. 

Pattern B assumes that the factor matrix has had a 5 by 5 identity 
matrix added at the bottom; rotation will now yield both the quartimax 
result and the transformation necessary to compute it from the input matrix. 
This device may be used generally to obtain the transformation associated 
with any rotational solution found by the program. 

Pattern C will place the first factor collinear with variable 4, and yield 
loadings of exactly zero for this variable on the remaining factors. At the 
same time, Factors II through V will be the quartimax factors available in 
the space orthogonal to variable 4. Such a pattern may be used to explore 
the implications of the hypothesis that variable 4 is unifactorial. 

Pattern D will separate the plane of variables 4 and 7 from the three- 
space orthogonal to this plane, and yield the raw quartimax factors available 
in this space. Factors I and II will depend on the quartimax rotation of 
variables 4 and 7 only, and are likely to make sense if these two variables 
are reasonably uncorrelated with each other. 

Pattern E will separate the same plane from the same three-space and 
yield the same last three factors as Pattern D. However, the rotation of 
Factors I and II will now depend on all the variables except 4 and 7. 

Pattern F is obviously intended to isolate the same plane and yield the 
same last three factors as Patterns D and E, but then to rotate Factor I 
in the plane through variable 4. Factor II emerges as close to variable 7 as 
the correlation of variable 7 with variable 4 (and Factor I) will permit. This 
is an example of a pattern that cannot safely be used directly; it should be 
preceded by at least one cycle of iterations using a simpler pattern such as 
D. Pattern F will not produce the intended result if at any stage subsequent 
to its introduction Factor I happens to account for relatively more variance 
in variable 7 than it does in variable 4. 

Pattern G is more complex. Variable 2 will be ignored completely, 
although all its loadings will appear in the final results. The program may or 
may not be able to provide a solution exhibiting the two overlapping group 
factors hypothesized as Factors I and II. No harm will be done if Pattern 
H is used in the first cycle of iterations, to be sure the initial rotations are 
made in the intended direction. 

Pattern I is interesting in that it does not hypothesize any factorially 
pure tests, yet, if we regard the 0’s as identifying loadings expected to be 
insignificant and the 1’s as identifying those expected to be large, the structure 
is completely determinate. It is the only pattern shown which could be used 
in Horst’s method [6]. 

Pattern J is designed to isolate a circumplex-like structure within the 
plane of the first two factors, anchoring axis I with variable 5. Influences 
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disturbing to the circumplex-like structure will be rotated into the last three 
factors and into a quartimax position with respect to each other. Variable 5, 
of course, must correlate exactly zero with the latter factors, and is being 
hypothesized to contain no variance that will disturb its relation to the 
circumplex structure. 

In the course of writing the computer program, several actual examples 
have been run; useful results have been obtained. No such examples are 
included here because these are more properly reported in the context of the 
substantive research to which they relate. 
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MATRIX FORMULATION OF DEUEL’S ROTATIONAL METHOD 
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Deuel’s rotational method for factor analysis is translated from vector 
algebra to matrix algebra. A computational example is presented using 
Deuel’s nomogram. 


Deuel [1] has presented a short-cut method for calculating the factor 
loadings for radial rotations by use of a nomogram. The derivation is pre- 
sented in terms of vector algebra. However, the relationships can easily be 
translated into Thurstone’s matrix algebra formulation. 

Essentially, the method involves use of the H matrix to go between 
successive V matrices and A matrices ({3], ch. 10). For each reference vector, 
Deuel’s d is equivalent to the corresponding diagonal element of Thurstone’s 
D matrix. The nomogram is a substitute for performing the multiplication 
AS and then calculating the factors which normalize the columns of the 
resulting matrix. The elements in each column of the S matrix are then 
multiplied by the corresponding d values to give the H matrix. However, 
such use is restricted to the case where each reference vector is moved along 
the plane generated by itself and one other reference vector. Nevertheless, 
this does not seem to be a cumbersome restriction in practice, 

As an illustration of the method, Guilford’s ((2], pp. 510-514) second 
oblique rotation of the Army Alpha factors is calculated by Deuel’s method, 
as shown in Table 1. The relevant cosines are obtained from M, , and the S 
values come from S,, . For each reference vector, the nomogram is entered 
with |S + cos 6| and | cos 6| to obtain d, . These multipliers are then applied 
to the columns of S,, to obtain H,, . The new factor loadings are calculated 
by multiplying V, by H,, . The check row of V,, is obtained by multiplying 
the sum row of V, successively by the columns of H,, . The absolute dis- 
crepancies between our values and Guilford’s values ((2], p. 513) average 
.003, with no discrepancy exceeding .008. Similarly, A 2 is obtained by 
multiplying Ag, by Ai; . 

Note that all the elements in these two matrix products require calcu- 
lating the algebraic sum of only two products (ab + cd). This is ordinarily 
not the case when each new set of reference vectors is referred back to the 
original F matrix. However, Thurstone’s caution ((3], p. 210) regarding 
cumulation of rounding errors when the H matrix is used should be kept in 
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TABLE 1 


Calculation Matrices 
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mind. This would mean, for example, calculating V, by the multiplication 
F Acs instead of the multiplication V,H,; as a computational check. 
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BOOK REVIEWS 


L. L. Tourstone. The Measurement of Values. Chicago: University of Chicago Press, 1959. 
Pp. viii + 322. 


This book contains twenty-seven articles written by Dr. Thurstone developing 
mathematical theories for measurement of subjective qualities and applying these scaling 
methods to analyses of data from various experiments in the area of the behavioral sciences. 
These papers are of outstanding interest; many of them have become essentially unavail- 
able to students. This volume, as Mrs. Thurstone indicates in the Preface, is the result of 
combined efforts by a group of Mr. Thurstone’s former students, the University of Chicago 
Press, and support from the Ford Foundation. 

In organizing his papers for possible publication, Mr. Thurstone classified them 
into five sections: learning, test theory, factor analysis, psychophysics, and applications 
in the behavioral sciences, He also prepared introductory statements to precede each of 
the five sections. The present volume includes the last two categories. No plans are under- 
way at present to publish papers of the first three sections. 

The volume begins with his 1936 address as the first retiring president of the Psycho- 
metric Society, ‘Psychology as a Quantitative Rational Science.’’ He stated and illustrated 
the objectives of the Society and of his own work in developing and verifying mathematically 
stated theories for various psychological problems. In this address he expressed his funda- 
mental viewpoint: ‘Psychological theory can be rigorous. There is an erroneous impression 
among psychologists, as well as among our academic neighbors, that psychological ideas 
are necessarily loose, verbal, subjective, and unfit for the quantitative analytical treatment 
of science. This impression is not justified.” 

Furthermore, he holds that while it is desirable to have physiological, clinical, or 
physical explanations of psychological phenomena, it is not essential. “It would be unfortu- 
nate if the development of any psychological idea should be restricted because of a compul- 
sion to make it look like physiology or to make it look like sociology . . . It is not necessary 
for us to abandon psychological concepts if we introduce analytical rigor in dealing with 
these concepts... It is better to formulate the laws of learning in terms of psychological 
ideas, and to find them experimentally verified, than to wait until the phenomena of 
learning can be rationalized in neurological terms.” 

The concluding words of that address are as relevant today as they were in 1936. 
‘Tn encouraging students to help us build an integrated interpretation of mental phenomena 
on an experimental foundation, let us remember that a psychological theory is not good 
simply because it is cleverly mathematical, that an experiment is not good just because 
it involves ingenious apparatus, and that statistics are merely the means for checking 
theory with experiment. In the long run we shall be judged in terms of the significance, 
the fruitfulness, and the self-consistency of the psychological principles that we discover.’’ 

The rest of the book reviews Dr. Thurstone’s work in one field, psychophysics as 
applied to one-dimensional measurement. Or we may restate the problem as one of under- 
standing how living organisms behave when forced to make difficult decisions or fine dis- 
criminations. The two basic psychological concepts used are the discriminal process and 
the discriminal dispersion. It is shown that these psychological concepts can be used to 
formulate experimentally testable laws of human behavior. 

The articles reprinted deal only with problems of linear measurement. Multidimen- 
sional scaling is given only a brief mention. The law of comparative judgment is developed 
in detail. The law of categorical judgment is assumed and utilized in analyzing successive 
intervals data, but is not developed and explained. 
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We do not have here any grandiose pretense that a theoretical formulation is being 
given for all human behavior. We have rather a serious and successful formulation of 
psychological laws dealing with one limited and manageable aspect of behavior—namely, 
judgments regarding psychological characteristics that constitute a linear continuum. 

The fact that the law of comparative judgment and other theoretical formulations 
using the basic concepts of the discriminal process and discriminal dispersion may be used 
in studying various social, sociological, moral, or aesthetic problems is still not adequately 
realized. Whenever the comment is made about a given problem ‘Oh, you can’t deal with 
that problem, it’s all a matter of opinion and the opinions disagree’’—the psychologists’ 
answer can be “The law of comparative judgment. and other subjective measurement 
methods were developed for exactly such problems.” 

The seventeen articles in Part II deal primarily with the development and use of 
the law of comparative judgment. The general theoretical model is given first and then the 
specific development of the various cases of the law of comparative judgment. Articles 
5 to 10 explore the relationship of this new theoretical approach to various aspects of 
traditional psychophysics. The problem of ‘‘equally often noticed differences’’ is considered. 
A clear exposition is given to show that equally often noticed differences are subjectively 
equal only when the discriminal dispersions of the stimuli involved are equal. The inde- 
pendence of Fechner’s law, Weber’s law, and the law of comparative judgment is demon- 
strated. It is shown that any one of these laws may hold for a given set of data while the 
other two do not. Or any two may hold while the other one does not. A verification of 
Fechner’s law using visual dot density and the method of equal appearing intervals is given 
in Article 9. 

The phi-gamma hypothesis—namely, that the psychometric curve, or ¢(7), is the 
integral of the normal probability curve—is also discussed. It is shown that this hypothesis 
cannot be true unless the absolute limen is constant for all stimulus intensities. If Weber’s 
law holds, then ¢(log +) is the integral of the normal curve. 

The law of comparative judgment is applied in the field of social psychology to 
experimental data regarding seriousness of various offenses, and to nationality preferences. 
Article 10 shows how the law of comparative judgment may be applied to rank order 
experiments and illustrates with data on handwriting specimens. 

One solution is given for the problem of determinimg a rational origin for the scale 
in terms of obtaining preferences for pairs of objects. The practicality of this solution is 
demonstrated by applying it to data on preferences for birthday gifts. 

Theorems relating to prediction of choice are developed and are shown to work 
satisfactorily when applied to food choices from menus. An extension into economic theory 
is given with the development of the indifference function and the successful application 
of the theory to data on choices among consumer goods—hats, shoes, and overcoats. 

It is repeatedly emphasized that the law of comparative judgment is not simply a 
computing routine for determining scale values for a set of objects. It is a mathematically 
stated theory of human behavior with which the data may agree or disagree. For each set 
of data the theory being investigated is stated, and the test for agreement between data 
and theory is specified. A measure of the discrepancy between theory and data is obtained. 
The result of the test—verification of the theory or not—is stressed. Determination of 
scale values of objects is clearly incidental to the study of laws of human behavior. 

Part III (Articles 19 to 27) deals with the application of psychophysical scaling 
procedures to various phases of the problem of attitude measurement. The general approach 
and basic assumptions are given in “Attitudes Can Be Measured” (Article 19). The rank 
order method and the method of paired comparisons are then applied to measurement of 
attitudes toward prohibition and to nationality preferences. 

The method of similar reactions developed in Article 22 is deserving of special atten- 
tion for those interested in the attitude scaling methods. It presents an attempt to deter- 











BOOK REVIEWS 213 


mine scale values of attitude statements from the percentage of joint endorsements. Present 
methods of latent class analysis make use of similar data. Professor Thurstone felt that a 
problem analogous to the communality problem had never been adequately dealt with 
in these methods and hoped for a more adequate theory to develop the underlying scale 
from data on joint endorsements. 

The remaining articles deal primarily with the application of attitude scales to the 
measurement of attitude toward the movies in general and to the measurement of the 
effect of specific movies on attitude toward the Chinese, gambling, and prohibition. These 
are only a few of some thirty experiments on the effect of movies, which were supported 
by a Payne Fund research grant and published only as lithoprinted reports. 

This book will be valued as giving a picture of the development of Professor 
Thurstone’s thought regarding a particular problem—psychophysical measurement—over 
a period of almost thirty years. The first articles were published in 1927, the last in 1956 
by Lyle V. Jones after Thurstone’s death. Particularly impressive is the solid theoretical 
foundation carefully laid in the beginning so that it could be developed without radical 
change for such a period of time. The book is also of interest simply for its presentation 
of the basic theory of the law of comparative judgment and extensive evidence that the 
law is in agreement with data on human behavior. With the modern growth of interest in 
decision processes and utility measurements it is interesting to see how this problem of 
understanding human behavior in making difficult or ambiguous judgments was approached 
by one of the most original psychological minds of this century. 

Haroip GULLIKSEN 
Princeton University 


J. P. Gurtrorp. Fundamental Statistics in Psychology and Education. (3rd ed.) New York: 
McGraw-Hill Book Company, 1956. Pp. xi + 565. 


Francis G. Corneu. The Essentials of Educational Statistics. New York: John Wiley and 
Sons, Inc., 1956. Pp. xii + 375. 


The teaching of statistics to students in the social sciences, particularly education, 
poses the knotty problem of teaching what is essentially a mathematical subject to students 
who by and large have a relatively low level of mathematical sophistication. A number of 
texts are marketed to reach this field; the texts under review represent two of these. 

Both of these are good texts. Both are teachable, both provide good coverage of the 
basic statistical methods appropriate to research in education and psychology, both provide 
numerical examples and exercises in sufficient quantity. Both books, indeed, provide more 
material than it is possible to cover adequately in a two-semester sequence, leaving the 
instructor with considerable choice as to topics to accentuate or slight. 

The differences between the two texts are found more in the style and organization 
than in the topical line-up. Guilford’s is the third edition of a well-known text first published 
in 1942. The current edition is mildly changed from the second, primarily in the treatment 
of hypothesis testing, chi square, and analysis of variance. Guilford suggests that the first 
nine chapters constitute a first course in statistics, thus making it clear that he conceives 
of a first course as composed of descriptive statistics plus some classical notions of sampling 
at the end of the course. The wealth of material in this section of the book is such that a 
teacher must be tough minded in forcing the pace if the student is to get beyond the tech- 
niques of descriptive statistics. 

Guilford’s intense interest in psychometrics dominates the second portion of the 
book. The emphasis is strongly on regression and correlation methods, with analysis of 
variance logic and techniques given relatively cursory development. Prediction and selec- 
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tion, reliability and validity of measures, and test scales and norms form the content of 
the last third of the book. 

Guilford’s treatment of analysis of variance, though expanded from the second 
edition, leaves something to be desired. He considers only three models, viz., simple one-way 
classification, two-way classification with one entry per cell, and two-way classification 
with replication in each cell. He does not mention the difficulty encountered in the third 
model when cell frequencies are disproportionate, nor does he make the distinction between 
fixed and random effects. He suggests that a nonsignificant interaction be pooled with the 
within cell variation to test main effects, even though the example cited appears to be a 
fixed effect modei. He does not mention models that are concerned with repeated measure- 
ments on independent groups of subjects, nor does he find space for presentation of simple 
analysis of covariance. These omissions are unfortunate in that experimental designs 
involving comparisons of rates of change among groups, e.g., learning or changes in attitude 
as measured by pre- and_post-tests, are quite common in education and psychology. In 
particular, the appropriateness of analysis of covariance to these situations is alone worth 
its inclusion. 

Guilford meets the difficulties of the nonmathematical student through a discursive 
style often pegged to an illustrative example. The book is lengthy, about 50 percent longer 
than Cornell’s. Guilford has done an excellent job of keeping his symbolism reasonably 
simple and consistent. Formulas are presented without proof, but with care to define each 
symbol in the formula. He refrains from algebraic manipulation of formulas, though he 
does collect a number of derivations and algebraic proofs into an appendix. For the student 
that reads easily but is disinclined to think mathematically the book is well adapted. On 
the other hand, it does little to stimulate interest in the mathematical aspects of statistics 
on the part of the better students; for this they must go outside of Guilford. 

Content-wise, Cornell covers much the same material as Guilford but with consider- 
able difference in emphasis and organization. Cornell’s style is terser, with more emphasis 
on symbolic notation and algebraic manipulations. He borrows freely from the symbolism 
of the mathematical statistician, though carefully explaining his notation as he goes. The 
student who has the necessary mathematical maturity—or who can acquire it as he goes 
along—to deal with Cornell’s more symbolic approach will gain in understanding of the 
logic of probability and hypothesis testing. But the student who has difficulty with the 
art of finding meaning in symbols will have rough going. 

While Guilford treats his introduction to correlation and regression as part of his 
development of descriptive statistics, Cornell interposes three chapters on probability, 
sampling theory, and testing hypothesis before he introduces correlation. This makes it 
difficult time-wise to do a reasonable job, in a first course, of introducing correlation to 
the student without considerable skipping in the earlier chapters. 

Cornell does the better job of developing analysis of variance logic and methodology, 
though he covers only the same three models as Guilford. Cornell does a careful job of 
developing the notion of expected mean squares and appropriate error terms; he makes 
note of the difference between fixed and random effects, though unfortunately uses as a 
random effect example one that he notes in the footnote is really a fixed effect design. He 
applies analysis of variance to test reliability through the Kuder-Richardson technique, 
but without noting the limitation of the item homogeneity methods to homogeneous power 
tests. 

Cornell’s treatment of special correlation methods and multiple and partial correla- 
tion is compact and without much discussion of appropriate applications. He concludes 
his book with a chapter on “Collecting and Reporting Statistical Data.” Disappointingly, 
this chapter turns out to be mainly a short treatise on how to report survey data in tabular 
form. The area implied by the title of the chapter is important and deserves better treat- 
ment. Be it noted, however, that few texts give even as much as Cornell. 
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Comparing the two books one concludes that Cornell’s main concern is to develop 
the logic and structure of the basic statistical methods in a relatively uncomplicated fashion, 
with secondary concern for applications. His book seems best adapted to a two-semester 
course for which students are more or less committed to the two semesters. Guilford’s, 
on the other hand, can be used for either a first or second course, with the two courses 
being relatively independent. Guilford is concerned primarily with applications, secondarily 
with the theory and logic; but if one looks for a text with its strength in the applications of 
correlation and regression to prediction and selection, to test reliability and validity, there 
is likely no better on the market. 


Joun E, ALMAN 
Boston University 


FREDERICK HERZBERG, BERNARD MAUSNER, AND BARBARA BLOcH SNYDERMAN. The 
Motivation to Work. New York: John Wiley & Sons, Inc., 1959. Pp. vii + 157. 


Motivation is a much talked about topic on which little light has been shed in recent 
years by either theoretical discussions or analyses of verbal statements of present attitudes, 
The authors of this book, after surveying many thousands of articles written in this field, 
selected a fresh approach for their research. 

A semi-structural interview was used for a group of 200 engineers and accountants 
who were asked to tell stories about times when they felt exceptionally good or bad about 
their jobs. Content analysis procedures were used on these 476 stories to identify and 
classify ‘thought units’”’ which yielded information regarding high and low morale periods, 
the relevant job factors, resulting attitudes, and their effects or consequences. Only a 
small number of highly interrelated factors were found responsible for good feelings about 
jobs. The key to understanding positive job feelings is found in a sense of personal growth 
and self-actualization resulting from achievement, responsibility, work itself, and advance- 
ment. 

A further conclusion indicates that these job satisfiers deal with factors involved 
in doing the job; job dissatisfiers deal with factors defining the job context including poor 
working conditions, bad company policies and administration, and bad supervision. But, 
good working conditions, good company policies and administration, and good supervision 
were not found to lead to positive job attitudes. 

The interpretation here is that jobs must be restructured to increase the proportion of 
workers who can derive positive motivation from doing their jobs. The other side of the job 
situation is regarded as primarily a hygiene situation, analogous to medical hygiene in 
which certain conditions are established to prevent job dissatisfaction, with little positive 
effect in developing high motivation. Specifically, recognition from the supervisor is seen 
as a motivator only when related to aspects of job achievement leading to feelings of personal 
growth and increasing responsibility. 

This book is important both in terms of its methodological innovations in collecting 
experiences, judgments, and observations in a somewhat more objective way than usual, 
and in its use of principles of sampling, directed observations, and detailed reports as a 
basis for analysis and interpretation of the results. The results should provide new insights 
and stimulate additional productive studies of the motivation to work. 


American Institute for Research Joun C, FLANAGAN 
and University of Pittsburgh 
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C. West CHURCHMAN AND PuILBuRN Ratoosu, (Eds.) Measurement: Definitions and 
Theories. New York: John Wiley & Sons, 1959. Pp. viii + 274. 


A five-part symposium on measurement was held during the AAAS meetings in 
December of 1956; the papers were assembled in this slender book. Psychology and eco- 
nomics team up to study utility by psychophysical methods, physics turns to the issues 
of quantum theory, philosophers discuss fundamental physical measurement of the classical 
sort, and a mathematician and a logician consider what measurements do within a formal 
system. On the periphery, Professors of Business Administration write of practical matters, 
and a little of the statistics of extreme values is discussed in a fairly practical vein. 

On utility one finds S. 8. Stevens giving history, polemic, and his findings in psycho- 
physics, and a rather controversial approach to utility measurement. Where and how 
magnitude estimates of feelings of happiness will appear in the future of utility measure- 
ment, one does not quite know, though it is interesting to suppose that utility varies as 
the square root of dollars. The defense offered by Stevens is his usual one that subjective 
states once judged by the observer gain acceptable scientific status through the judgments. 

Three other papers take up utility measurement in ways closer to the von Neumann- 
Morgenstern theory of games. Those writers, it will be recalled, would measure utility by 
gambles, matching a high probability of this against a low probability of that and observing 
indifference points; thus desire is measured against probability. The papers in this book 
deal with subjective probability rather than objective, and all are concerned with incon- 
sistency of choices. These advances are based on Savage’s introduction of subjective prob- 
ability and on the early experiment by Mosteller and Nogee in which people showed, not 
discrete indifference points, but the usual psychophysical ogives when choosing between 
gambles. 

Luce takes the most extremely probabilistic position, supposing that differences in 
utility are directly represented in probabilities of choice, and that choices on gambles can 
be decomposed into probable choices of the objects and probable judgments on chance 
events. The result is, in part, that some reasonable axioms may lead to unreasonable 
conclusions—a result more fully developed and considered in Luce’s recent book. One’s 
alternatives are to find the holes in the axioms or to replace axioms with experimental data. 

Davidson and Marschak, philosopher and economist, respectively, report a psycho- 
logical experiment which aims at measurement of the utility of small amounts of money. 
Their models make utility less precisely dependent on probabilities of choice than Luce’s, 
though the general approach is similar. Unfortunately, the caution in mathematical formu- 
lation and the efforts to make the choices independent lead to rather weak conclusions. 
The latter hinge only on the frequency of intransitive relationships, tested by ingenious 
but rather indecisive statistical methods. 

Luce as well as Davidson and Marschak are somewhat dubious about the assumption 
that utility can be measured by probabilities of choice. Coombs reports his experiment 
which explicates such doubts. The point seems to be that choices become unstable if utility 
differences are small or if the objects chosen are dissimilar. This idea appears in economics 
under the title of ‘noncomparability of utilities’ in one of its uses, and threatens the 
stochastic theories of utility measurement. 

Stevens wants numbers and gets them in the most direct way, though one is left 
somewhat at a loss as to their immediate use. The other writers have various versions of 
a theory of decision and choice, and seek numbers to fit. They are, as it were, estimating 
parameters of a theoretical model; they assume the model throughout their endeavors. 
The other authors have something to say about the question of priority of measurement 
or theory in the context of physics, and perhaps their ideas will be of interest. 

Philosopher Caws points out that one must define what he will measure, and thus . 
must solve many theoretical problems before measurements have clear meaning. Pap 
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supports this point by showing that “operational definitions’ by measurement are always 
incomplete, leaving the scientist free to change theory, definition, or measuring technique 
when he is in trouble. The measurements seem to have no special status. Physicists Margenau 
and McKnight, discussing measurement from the point of view of quantum theory, show 
a definite tendency to subordinate measurement issues to theoretic ones, especially in 
interpreting those peculiar proposals about the precision of measurements which are at 
the foundation of quantum theory. Both prefer to say that quantum theory explains why 
measurements come out as they do, rather than taking the position that the limitation of 
measurements forces the theory into its present form. 

Some problems of measurement theory arise when one inserts the measurements 
into a mathematical structure. While a scientist may not notice it, the fact is that measure- 
ments often have a rather equivocal position in the formal system, being variables but not 
mathematical variables, and carrying with them peculiar nonmathematical labels like 
pounds, feet per second, or responses per minute. Menger’s paper on the mathematical 
status of measurements was of help to the reviewer in untangling part of this skein; his 
functional notation seems an important bid to clear away some of the complicated debris of 
unfortunate notation appearing in mathematical texts. It is part of the current move to 
clarity through abstraction and precision, which seems likely to make mathematics more 
available to our students than it was to us. Suppes, in a logical paper, struggles with the 
meaningfulness of sentences like “the mass of the sun is 10°,” which lacks a unit of measure- 
ment. His conclusion indicates that one must be clear as to the mathematical status of 
measurements before deducing, or one’s logic may turn out senseless. 

It is clear enough, from the philosophical arguments and the statements of the 
physicists, that measurement is considered a mere link in a larger enterprise anchored 
both in fact and in theory. The article of faith of the psychological empiricist is that one 
collects plenty of data and uses precise measurements and then opens his lap for the forth- 
coming understanding. Neither philosopher nor physicist seems willing to go along with 
this strategy, for as they see the game any piece may be sacrified for sufficient advantage— 
or at least one would not lose the game to protect the queen. No degree of precision in 
measuring the wrong thing makes it right, and no level of reliability is high enough to 
ensure validity. 

All in all this is a book to be read in short pieces and thought about, for the reader 
is left free to interpret what it all means. There are enough problems, challenges, and 
suggestions to make the book worth its high price to some readers, even though many of 
the writers have fuller discussions of the same material elsewhere. 

Frank RESTLE 
Michigan State University 


E. Bovewie. Matrix Calculus. Amsterdam: North-Holland Publishing Company; New 
York: Interscience Publishers, Inc., 1956. Pp. xi + 334. 


Rosert M. Traut anp Leonarp TornuHEmM. Vector Spaces and Matrices. New York: 
John Wiley & Sons, Inc., 1957. Pp. xii + 318. 


Both of these books are intended as textbooks for an advanced course in mathematics; 
neither is explicitly concerned with practical applications of linear algebra to analysis of 
empirical data. Considered together, these books exemplify the fact that approaches to 
linear algebra may be made at various levels of abstraction and generality. The lowest 
level is exemplified by texts on factor analysis in which matrices are conceived as rectangular 
arrays of numbers and generous use is made of summation signs in developing matrix 
operations; emphasis is on the scalar elements. At the next level vectors are defined as 
n-tuples and vectors are the basic elements in conceptual development and operations in 
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linear algebra. Bodewig’s text is written largely at this level. At still higher levels the 
axiomatic method is used to develop theory over an arbitrary field, at first without groups 
and operators. Later, vectors and matrices disappear as basic elements and are replaced 
by linear transformations and bilinear functions. Finally, the theory of vector spaces 
becomes a specialization of the theories of groups, rings, and algebras. 

Thrall and Tornheim proceed at two levels—one concrete, the other axiomatic. 
Starting with vectors as n-tuples, they abstract a set of axioms for a vector space and 
discuss properties of a vector space at both levels. They believe that a number of pedagogical 
advantages result from such a dual approach. This reviewer agrees, especially where such 
a text is used by an insightful instructor teaching students with the presumed capability 
and motivation to profit from such studies. 

Bodewig proceeds primarily at a single, relatively concrete level. His aim may be 
contrasted with that of Thrall and Tornheim. Bodewig’s point is that notation in matrix 
theory developed from that of determinants, with emphasis on scalar elements rather than 
on the “true building blocks of a matrix,” the rows and columns. He feels that matrix 
theory was thus forced into a Procrustean bed, with a resulting “discrepancy between 
thought and calculation.”’ Bodewig is thus concerned with introducing a notation that is 
both aesthetically satisfactory and practical. Extensive use is made of unity vectors in 
which all elements but one are zero and the remaining element unity. Even though concepts 
and operations are more important than symbols, the early history of the differential 
calculus reminds us of the importance of notation in facilitating elegance and efficiency. 

The two books differ considerably in the topics covered and in the manner of treat- 
ment of their common topics. After introducing vectors, matrices, and his unity vector 
notation, Bodewig discusses eigenvalues. He proceeds in parts II and III to a consideration 
of direct and iterative methods for the solution of simultaneous linear equations and for 
matrix inversion. Part IV, eigenproblems, deals with iterative and direct methods for 
obtaining eigenvalues and eigenvectors. This book will be of considerable reference value, 
in conjunction with books on numerical analysis, to persons dealing with applied problems 
on electronic computers. 

Thrall and Tornheim proceed from the concept of vector spaces and their properties 
to the study of mappings from one vector space into another. This topic is dealt with 
abstractly as linear transformations and concretely as matrices. This is followed by con- 
siderations of equivalence, partition, canonical forms, invariance, and similarity of matrices. 
A chapter on vector functions includes bilinear, quadratic, and Hermitian forms and 
functions. The concepts of vector length and Euclidian space are considered next with 
orthogonal and unitary equivalence. After a discussion of polynomial rings and matrix 
equivalence over a ring, the book concludes with a chapter on linear inequalities, with 
reference to linear programming, the minimax theorem, and matrix games. 

Readers of Psychometrika should find both volumes interesting and enlightening. 
Both should aid the psychometrician in meeting the challenge to become aware of structures 
and meanings in higher mathematics, and to use such insight to search for more funda- 
mental and abstract types of order in empirical domains. These books should provide a 
thorough treatment of selected mathematical topics in the doctoral programs of those 
who will be concerned with the development of psychology as a rational science in this 
age of electronic computers. 


Personnel Laboratory, : Joun A, CREAGER 
Lackland Air Force Base 











